hdallatorre commited on
Commit
359ad7a
·
verified ·
1 Parent(s): f488e09

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -2
README.md CHANGED
@@ -10,7 +10,7 @@ tags:
10
  ---
11
  # segment-nt-30kb
12
 
13
- Segment-NT-30kb is a segmentation model leveraging the Nucleotide Transformer (NT) DNA foundation model to predict the location of several types of genomics
14
  elements in a sequence at a single nucleotide resolution. It was trained on 14 different classes of human genomics elements in input sequences up to 30kb. These
15
  include gene (protein-coding genes, lncRNAs, 5’UTR, 3’UTR, exon, intron, splice acceptor and donor sites) and regulatory (polyA signal, tissue-invariant and
16
  tissue-specific promoters and enhancers, and CTCF-bound sites) elements.
@@ -86,7 +86,7 @@ The model was trained on a DGXH100 on a total of 23B tokens. The model was train
86
 
87
  ### Architecture
88
 
89
- The model is composed of the [nucleotide-transformer-v2-50m-multi-species](https://huggingface.co/InstaDeepAI/nucleotide-transformer-v2-50m-multi-species) encoder, from which we removed
90
  the language model head and replaced it by a 1-dimensional U-Net segmentation head [4] made of 2 downsampling convolutional blocks and 2 upsampling convolutional blocks. Each of these
91
  blocks is made of 2 convolutional layers with 1, 024 and 2, 048 kernels respectively. This additional segmentation head accounts for 53 million parameters, bringing the total number of parameters
92
  to 562M.
 
10
  ---
11
  # segment-nt-30kb
12
 
13
+ Segment-NT-30kb is a segmentation model leveraging the [Nucleotide Transformer](https://huggingface.co/InstaDeepAI/nucleotide-transformer-v2-500m-multi-species) (NT) DNA foundation model to predict the location of several types of genomics
14
  elements in a sequence at a single nucleotide resolution. It was trained on 14 different classes of human genomics elements in input sequences up to 30kb. These
15
  include gene (protein-coding genes, lncRNAs, 5’UTR, 3’UTR, exon, intron, splice acceptor and donor sites) and regulatory (polyA signal, tissue-invariant and
16
  tissue-specific promoters and enhancers, and CTCF-bound sites) elements.
 
86
 
87
  ### Architecture
88
 
89
+ The model is composed of the [nucleotide-transformer-v2-50m-multi-species](https://huggingface.co/InstaDeepAI/nucleotide-transformer-v2-500m-multi-species) encoder, from which we removed
90
  the language model head and replaced it by a 1-dimensional U-Net segmentation head [4] made of 2 downsampling convolutional blocks and 2 upsampling convolutional blocks. Each of these
91
  blocks is made of 2 convolutional layers with 1, 024 and 2, 048 kernels respectively. This additional segmentation head accounts for 53 million parameters, bringing the total number of parameters
92
  to 562M.