JunxiongWang commited on
Commit
1d7ace7
·
verified ·
1 Parent(s): 9a0ea0d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +44 -1
README.md CHANGED
@@ -2,4 +2,47 @@
2
  license: apache-2.0
3
  ---
4
 
5
- Train in 30B Byte. Mode size 353M. Table 2 in [MambaByte](https://arxiv.org/abs/2401.13660)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2
  license: apache-2.0
3
  ---
4
 
5
+ Train in 30B Byte. Mode size 353M. Table 2 in [MambaByte](https://arxiv.org/abs/2401.13660)
6
+
7
+ To use
8
+
9
+ ```
10
+ import torch
11
+ import numpy as np
12
+
13
+ from mamba_ssm.models.mixer_seq_simple import MambaLMHeadModel
14
+
15
+ model=MambaLMHeadModel.from_pretrained("JunxiongWang/MambaByte_Arxiv", device='cuda', dtype=torch.bfloat16)
16
+
17
+ text = "\documentclass[12pt]{article}"
18
+ text_byte = np.frombuffer(text.encode('utf-8'), dtype=np.uint8)
19
+ input_ids = torch.from_numpy(text_byte[None, :]).long().cuda()
20
+
21
+ sample = model.generate(
22
+ input_ids=input_ids,
23
+ max_length=2048,
24
+ cg=True,
25
+ return_dict_in_generate=True,
26
+ output_scores=True,
27
+ enable_timing=True,
28
+ temperature=1,
29
+ top_k=256,
30
+ top_p=0.9,
31
+ )
32
+
33
+ print(bytes(sample.sequences[0].tolist()).decode('utf-8'))
34
+ ```
35
+
36
+ Output:
37
+
38
+ ```
39
+ \documentclass[12pt]{article}}}}^{{\mathbf{P}}\uplus{\mathbf{Q}}}}}}}{}}$ is a symmetric poset. This implies that $$\operatorname{end}({\mathscr{L}}) = \operatorname{end}({\mathscr{L}}\setminus\{\sigma_{{\mathbf{P}}}\}) = \operatorname{end}({\mathscr{L}}\setminus\{\sigma_{{\mathbf{Q}}}\}) = \operatorname{end}({\mathscr{L}}\setminus\{\sigma_{{\mathbf{P}}},\sigma_{{\mathbf{Q}}}\}),$$ i.e., ${\mathscr{L}}$ is $\{\sigma_{{\mathbf{P}}},\sigma_{{\mathbf{Q}}}\}$-bistochastic for any ${\mathbf{P}}\neq{\mathbf{Q}}$. Thus, ${\mathscr{L}}$ is reversible, and is in fact maximal among all $\{\sigma_{{\mathbf{P}}},\sigma_{{\mathbf{Q}}}\}$-bistochastic matrices.
40
+
41
+ Since ${\mathscr{L}}$ is in the same class as $\sigma_{{\mathbf{P}}}^{{\mathbf{Q}}}$, we have $\operatorname{end}({\mathscr{L}})\subseteq\operatorname{end}({\mathscr{L}})$. Conversely, if $\operatorname{end}({\mathscr{L}})=\operatorname{end}({\mathscr{L}})$, then $\sigma_{{\mathbf{P}}}^{{\mathbf{Q}}}$ is maximal in $\operatorname{end}({\mathscr{L}})$. Since ${\mathbf{P}}\setminus\{\sigma_{{\mathbf{P}}}\}\subseteq\operatorname{end}({\mathscr{L}})$, this implies that ${\mathscr{L}}$ is in the same class as $\sigma_{{\mathbf{P}}}^{{\mathbf{Q}}}$, and hence ${\mathscr{L}}$ is reversible.
42
+
43
+ We are now ready to show that $\{\sigma_{{\mathbf{P}}},\sigma_{{\mathbf{Q}}}\}$-bistochastic matrices form a symmetric poset of ends.
44
+
45
+ \[lem:end\_symm\_class\] Let ${\mathbf{P}},{\mathbf{Q}}\in{\mathscr{M}}$. Then $\sigma_{{\mathbf{P}}}^{{\mathbf{Q}}}$ is symmetric if and only if $\operatorname{end}({\mathscr{L}})=\operatorname{end}({\mathscr{L}})$.
46
+
47
+ Suppose that $\operatorname{end}({\mathscr{L}})=\operatorname{end}({\mathscr{L}})$, and we prove that $\sigma_{{\mathbf{P}}}^{{\mathbf{Q}}}$ is symmetric. Clearly, $\operatorname{end}({\mathscr{L}})$ contains exactly the ends of $\operatorname{end}({\mathscr{L}})$ by definition, and the only case that survives is when $\operatorname{end}({\mathscr{L}})=\operatorname{end}({\mathscr{L}})$. By construction, this means that $\sigma_{{\mathbf{P}}}
48
+ ```