Update README.md
Browse files
README.md
CHANGED
@@ -10,7 +10,7 @@ tags:
|
|
10 |
---
|
11 |
# segment-nt-30kb
|
12 |
|
13 |
-
Segment-NT-30kb is a segmentation model leveraging the Nucleotide Transformer (NT) DNA foundation model to predict the location of several types of genomics
|
14 |
elements in a sequence at a single nucleotide resolution. It was trained on 14 different classes of human genomics elements in input sequences up to 30kb. These
|
15 |
include gene (protein-coding genes, lncRNAs, 5’UTR, 3’UTR, exon, intron, splice acceptor and donor sites) and regulatory (polyA signal, tissue-invariant and
|
16 |
tissue-specific promoters and enhancers, and CTCF-bound sites) elements.
|
@@ -86,7 +86,7 @@ The model was trained on a DGXH100 on a total of 23B tokens. The model was train
|
|
86 |
|
87 |
### Architecture
|
88 |
|
89 |
-
The model is composed of the [nucleotide-transformer-v2-50m-multi-species](https://huggingface.co/InstaDeepAI/nucleotide-transformer-v2-
|
90 |
the language model head and replaced it by a 1-dimensional U-Net segmentation head [4] made of 2 downsampling convolutional blocks and 2 upsampling convolutional blocks. Each of these
|
91 |
blocks is made of 2 convolutional layers with 1, 024 and 2, 048 kernels respectively. This additional segmentation head accounts for 53 million parameters, bringing the total number of parameters
|
92 |
to 562M.
|
|
|
10 |
---
|
11 |
# segment-nt-30kb
|
12 |
|
13 |
+
Segment-NT-30kb is a segmentation model leveraging the [Nucleotide Transformer](https://huggingface.co/InstaDeepAI/nucleotide-transformer-v2-500m-multi-species) (NT) DNA foundation model to predict the location of several types of genomics
|
14 |
elements in a sequence at a single nucleotide resolution. It was trained on 14 different classes of human genomics elements in input sequences up to 30kb. These
|
15 |
include gene (protein-coding genes, lncRNAs, 5’UTR, 3’UTR, exon, intron, splice acceptor and donor sites) and regulatory (polyA signal, tissue-invariant and
|
16 |
tissue-specific promoters and enhancers, and CTCF-bound sites) elements.
|
|
|
86 |
|
87 |
### Architecture
|
88 |
|
89 |
+
The model is composed of the [nucleotide-transformer-v2-50m-multi-species](https://huggingface.co/InstaDeepAI/nucleotide-transformer-v2-500m-multi-species) encoder, from which we removed
|
90 |
the language model head and replaced it by a 1-dimensional U-Net segmentation head [4] made of 2 downsampling convolutional blocks and 2 upsampling convolutional blocks. Each of these
|
91 |
blocks is made of 2 convolutional layers with 1, 024 and 2, 048 kernels respectively. This additional segmentation head accounts for 53 million parameters, bringing the total number of parameters
|
92 |
to 562M.
|