upload model weights, readme, and config
Browse files- README.md +76 -3
- config.json +41 -0
- model-with-instance-classifiers.safetensors +3 -0
- model.safetensors +3 -0
- torchscript_model.pt +3 -0
README.md
CHANGED
|
@@ -1,3 +1,76 @@
|
|
| 1 |
-
|
| 2 |
-
|
| 3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Pancancer tissue classifier
|
| 2 |
+
|
| 3 |
+
This model classifies among 32 cancers from TCGA. It was trained by Jakub Kaczmarzyk using CLAM.
|
| 4 |
+
|
| 5 |
+
Output classes: ACC, BLCA, BRCA, CESC, CHOL, COAD, DLBC, ESCA, GBM, HNSC, KICH, KIRC, KIRP, LGG, LIHC, LUAD, LUSC, MESO, OV, PAAD, PCPG, PRAD, READ, SARC, SKCM, STAD, TGCT, THCA, THYM, UCEC, UCS, UVM.
|
| 6 |
+
|
| 7 |
+
Please see the [TCGA study abbreviations](https://gdc.cancer.gov/resources-tcga-users/tcga-code-tables/tcga-study-abbreviations) to map these class names to the TCGA study names.
|
| 8 |
+
|
| 9 |
+
## Data
|
| 10 |
+
|
| 11 |
+
Diagnostic slides in TCGA (e.g., `DX`) were used to train the model. The whole slide images were tiles into 128x128um patches, and each patch was encoded using CTransPath (this produces 768-dimensional embeddings).
|
| 12 |
+
|
| 13 |
+
Train, validation, and test splits were stratified by TCGA study, and patients did not cross split boundaries.
|
| 14 |
+
|
| 15 |
+
Samples sizes:
|
| 16 |
+
- Train: 9,257 slides (7,633 patients)
|
| 17 |
+
- Validation: 1,186 slides (955 patients)
|
| 18 |
+
- Test: 1,163 slides (955 patients)
|
| 19 |
+
|
| 20 |
+
## Model performance
|
| 21 |
+
|
| 22 |
+
The model achieved a weighted average AUROC of 0.99 (one-vs-rest).
|
| 23 |
+
|
| 24 |
+
Here are the one-vs-rest AUROC values for each TCGA study.
|
| 25 |
+
|
| 26 |
+
- ACC: 0.9993
|
| 27 |
+
- BLCA: 0.9814
|
| 28 |
+
- BRCA: 0.9908
|
| 29 |
+
- CESC: 0.9868
|
| 30 |
+
- CHOL: 0.9972
|
| 31 |
+
- COAD: 0.9927
|
| 32 |
+
- DLBC: 0.9996
|
| 33 |
+
- ESCA: 0.9571
|
| 34 |
+
- GBM: 0.9984
|
| 35 |
+
- HNSC: 0.9974
|
| 36 |
+
- KICH: 0.9998
|
| 37 |
+
- KIRC: 0.9993
|
| 38 |
+
- KIRP: 0.9952
|
| 39 |
+
- LGG: 0.9984
|
| 40 |
+
- LIHC: 0.9988
|
| 41 |
+
- LUAD: 0.9879
|
| 42 |
+
- LUSC: 0.9868
|
| 43 |
+
- MESO: 0.9961
|
| 44 |
+
- OV: 0.9900
|
| 45 |
+
- PAAD: 0.9897
|
| 46 |
+
- PCPG: 0.9944
|
| 47 |
+
- PRAD: 1.0000
|
| 48 |
+
- READ: 0.9752
|
| 49 |
+
- SARC: 0.9946
|
| 50 |
+
- SKCM: 0.9957
|
| 51 |
+
- STAD: 0.9932
|
| 52 |
+
- TGCT: 0.9957
|
| 53 |
+
- THCA: 1.0000
|
| 54 |
+
- THYM: 0.9991
|
| 55 |
+
- UCEC: 0.9971
|
| 56 |
+
- UCS: 0.9863
|
| 57 |
+
- UVM: 0.9997
|
| 58 |
+
|
| 59 |
+
### Renal cell carcinoma (RCC) subtyping
|
| 60 |
+
|
| 61 |
+
RCC subtyping is a relatively common benchmark task for slide-level classification. We evaluate this model on RCC subtyping.
|
| 62 |
+
|
| 63 |
+
When tested on a set of 52 KIRC slides and 28 KIRP slides (from the overall test set), the model achieved a balanced accuracy of 0.88.
|
| 64 |
+
|
| 65 |
+
### Non-small cell lung cancer (NSCLC) subtyping
|
| 66 |
+
|
| 67 |
+
NSCLC subtyping is a relatively common benchmark task for slide-level classification. We evaluate this model on NSCLC subtyping.
|
| 68 |
+
|
| 69 |
+
When tested on a set of 55 LUAD slides and 58 LUSC slides (from the overall test set), the model achieved a balanced accuracy of 0.76.
|
| 70 |
+
|
| 71 |
+
|
| 72 |
+
# Intended uses
|
| 73 |
+
|
| 74 |
+
This model is ONLY intended for research purposes.
|
| 75 |
+
|
| 76 |
+
**This model may not be used for clinical purposes.** This model is distributed without warranties, either express or implied.
|
config.json
ADDED
|
@@ -0,0 +1,41 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"spec_version": "1.0",
|
| 3 |
+
"type": "clam",
|
| 4 |
+
"patch_size_um": 128,
|
| 5 |
+
"feature_extractor": "ctranspath",
|
| 6 |
+
"num_classes": 32,
|
| 7 |
+
"class_names": [
|
| 8 |
+
"ACC",
|
| 9 |
+
"BLCA",
|
| 10 |
+
"BRCA",
|
| 11 |
+
"CESC",
|
| 12 |
+
"CHOL",
|
| 13 |
+
"COAD",
|
| 14 |
+
"DLBC",
|
| 15 |
+
"ESCA",
|
| 16 |
+
"GBM",
|
| 17 |
+
"HNSC",
|
| 18 |
+
"KICH",
|
| 19 |
+
"KIRC",
|
| 20 |
+
"KIRP",
|
| 21 |
+
"LGG",
|
| 22 |
+
"LIHC",
|
| 23 |
+
"LUAD",
|
| 24 |
+
"LUSC",
|
| 25 |
+
"MESO",
|
| 26 |
+
"OV",
|
| 27 |
+
"PAAD",
|
| 28 |
+
"PCPG",
|
| 29 |
+
"PRAD",
|
| 30 |
+
"READ",
|
| 31 |
+
"SARC",
|
| 32 |
+
"SKCM",
|
| 33 |
+
"STAD",
|
| 34 |
+
"TGCT",
|
| 35 |
+
"THCA",
|
| 36 |
+
"THYM",
|
| 37 |
+
"UCEC",
|
| 38 |
+
"UCS",
|
| 39 |
+
"UVM"
|
| 40 |
+
]
|
| 41 |
+
}
|
model-with-instance-classifiers.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:8c15dcf4dd1d901acd0581850edae97d837fbc920c6f0592baebcb7c7aa2542e
|
| 3 |
+
size 2830572
|
model.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:8e0539cb88046f2b8515c149b525af8f05e505e5e81e59b5405ebdf07b64e4f4
|
| 3 |
+
size 2693188
|
torchscript_model.pt
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:1c9737b34ba3c1041de80e9b1f42096ebf11438acf0a80d542fdf8d9aed7ed98
|
| 3 |
+
size 2711792
|