InstaDeepAI
/

segment_nt_multi_species

@@ -8,13 +8,13 @@ tags:
 - genomics
 - segmentation
 ---
-# segment-nt-30kb-multi-species
-Segment-NT-30kb-multi-species is a segmentation model leveraging the [Nucleotide Transformer](https://huggingface.co/InstaDeepAI/nucleotide-transformer-v2-500m-multi-species) (NT) DNA foundation model to predict the location of several types of genomics
-elements in a sequence at a single nucleotide resolution. It is the result of finetuning the [Segment-NT-30kb](https://huggingface.co/InstaDeepAI/segment_nt_30kb) model on a dataset encompassing the human genome
 but also the genomes of 5 selected species: mouse, chicken, fly, zebrafish and worm.
-For the finetuning on the multi-species genomes, we curated a dataset of a subset of the annotations used to train **Segment-NT-30kb**, mainly because only this subset of annotations is
 available for these species. The annotations therefore concern the 7 main gene elements available from Ensembl [REF], namely protein-coding gene, 5’UTR, 3’UTR, intron, exon,
 splice acceptor and donor sites.
@@ -59,8 +59,8 @@ features = [
     "promoter_Tissue_invariant",
   ]
-tokenizer = AutoTokenizer.from_pretrained("InstaDeepAI/segment_nt_30kb_multi_species", trust_remote_code=True)
-model = AutoModel.from_pretrained("InstaDeepAI/segment_nt_30kb_multi_species", trust_remote_code=True)
 # Choose the length to which the input sequences are padded. By default, the
 # model max length is chosen, but feel free to decrease it as the time taken to
@@ -100,7 +100,7 @@ print(f"Intron probabilities shape: {probabilities_intron.shape}")
 ## Training data
-The **segment-nt-30kb-multi-species** model was finetuned on human, mouse, chicken, fly, zebrafish and worm genomes. For each specie, a subset of chromosomes is kept as
 validation for training monitoring and test for final evaluation.
 ## Training procedure

 - genomics
 - segmentation
 ---
+# segment-nt-multi-species
+Segment-NT-multi-species is a segmentation model leveraging the [Nucleotide Transformer](https://huggingface.co/InstaDeepAI/nucleotide-transformer-v2-500m-multi-species) (NT) DNA foundation model to predict the location of several types of genomics
+elements in a sequence at a single nucleotide resolution. It is the result of finetuning the [Segment-NT](https://huggingface.co/InstaDeepAI/segment_nt) model on a dataset encompassing the human genome
 but also the genomes of 5 selected species: mouse, chicken, fly, zebrafish and worm.
+For the finetuning on the multi-species genomes, we curated a dataset of a subset of the annotations used to train **Segment-NT**, mainly because only this subset of annotations is
 available for these species. The annotations therefore concern the 7 main gene elements available from Ensembl [REF], namely protein-coding gene, 5’UTR, 3’UTR, intron, exon,
 splice acceptor and donor sites.
     "promoter_Tissue_invariant",
   ]
+tokenizer = AutoTokenizer.from_pretrained("InstaDeepAI/segment_nt_multi_species", trust_remote_code=True)
+model = AutoModel.from_pretrained("InstaDeepAI/segment_nt_multi_species", trust_remote_code=True)
 # Choose the length to which the input sequences are padded. By default, the
 # model max length is chosen, but feel free to decrease it as the time taken to
 ## Training data
+The **segment-nt-multi-species** model was finetuned on human, mouse, chicken, fly, zebrafish and worm genomes. For each specie, a subset of chromosomes is kept as
 validation for training monitoring and test for final evaluation.
 ## Training procedure