Update README.md
Browse files
README.md
CHANGED
@@ -39,8 +39,8 @@ Segment-NT-multi-species has been shown to generalize up to sequences of 50,000
|
|
39 |
the `rescaling_factor` of the Rotary Embedding layer in the esm model `num_dna_tokens_inference / max_num_tokens_nt` where `num_dna_tokens_inference` is the number of tokens at inference
|
40 |
(i.e 6669 for a sequence of 40008 base pairs) and `max_num_tokens_nt` is the max number of tokens on which the backbone nucleotide-transformer was trained on, i.e `2048`.
|
41 |
|
42 |
-
|
43 |
-
The
|
44 |
paper.
|
45 |
|
46 |
```python
|
|
|
39 |
the `rescaling_factor` of the Rotary Embedding layer in the esm model `num_dna_tokens_inference / max_num_tokens_nt` where `num_dna_tokens_inference` is the number of tokens at inference
|
40 |
(i.e 6669 for a sequence of 40008 base pairs) and `max_num_tokens_nt` is the max number of tokens on which the backbone nucleotide-transformer was trained on, i.e `2048`.
|
41 |
|
42 |
+

|
43 |
+
The `./inference_segment_nt.ipynb` notebook shows how to set the rescaling factor and infer on a 50kb sequence of the human chromosome 20 in order to reproduce Fig.3 of the
|
44 |
paper.
|
45 |
|
46 |
```python
|