InstaDeepAI
/

segment_nt

Feature Extraction

Model card Files Files and versions Community

hdallatorre commited on Mar 6, 2024

Commit

359ad7a

·

verified ·

1 Parent(s): f488e09

Update README.md

Files changed (1) hide show

README.md +2 -2

README.md CHANGED Viewed

@@ -10,7 +10,7 @@ tags:
 ---
 # segment-nt-30kb
-Segment-NT-30kb is a segmentation model leveraging the Nucleotide Transformer (NT) DNA foundation model to predict the location of several types of genomics
 elements in a sequence at a single nucleotide resolution. It was trained on 14 different classes of human genomics elements in input sequences up to 30kb. These
 include gene (protein-coding genes, lncRNAs, 5’UTR, 3’UTR, exon, intron, splice acceptor and donor sites) and regulatory (polyA signal, tissue-invariant and
 tissue-specific promoters and enhancers, and CTCF-bound sites) elements.
@@ -86,7 +86,7 @@ The model was trained on a DGXH100 on a total of 23B tokens. The model was train
 ### Architecture
-The model is composed of the [nucleotide-transformer-v2-50m-multi-species](https://huggingface.co/InstaDeepAI/nucleotide-transformer-v2-50m-multi-species) encoder, from which we removed
 the language model head and replaced it by a 1-dimensional U-Net segmentation head [4] made of 2 downsampling convolutional blocks and 2 upsampling convolutional blocks. Each of these
 blocks is made of 2 convolutional layers with 1, 024 and 2, 048 kernels respectively. This additional segmentation head accounts for 53 million parameters, bringing the total number of parameters
 to 562M.

 ---
 # segment-nt-30kb
+Segment-NT-30kb is a segmentation model leveraging the [Nucleotide Transformer](https://huggingface.co/InstaDeepAI/nucleotide-transformer-v2-500m-multi-species) (NT) DNA foundation model to predict the location of several types of genomics
 elements in a sequence at a single nucleotide resolution. It was trained on 14 different classes of human genomics elements in input sequences up to 30kb. These
 include gene (protein-coding genes, lncRNAs, 5’UTR, 3’UTR, exon, intron, splice acceptor and donor sites) and regulatory (polyA signal, tissue-invariant and
 tissue-specific promoters and enhancers, and CTCF-bound sites) elements.
 ### Architecture
+The model is composed of the [nucleotide-transformer-v2-50m-multi-species](https://huggingface.co/InstaDeepAI/nucleotide-transformer-v2-500m-multi-species) encoder, from which we removed
 the language model head and replaced it by a 1-dimensional U-Net segmentation head [4] made of 2 downsampling convolutional blocks and 2 upsampling convolutional blocks. Each of these
 blocks is made of 2 convolutional layers with 1, 024 and 2, 048 kernels respectively. This additional segmentation head accounts for 53 million parameters, bringing the total number of parameters
 to 562M.