upload model weights, readme, and config

Browse files

Files changed (5) hide show

README.md +76 -3
config.json +41 -0
model-with-instance-classifiers.safetensors +3 -0
model.safetensors +3 -0
torchscript_model.pt +3 -0

README.md CHANGED Viewed

@@ -1,3 +1,76 @@
----
-license: cc-by-4.0
----

+# Pancancer tissue classifier
+This model classifies among 32 cancers from TCGA. It was trained by Jakub Kaczmarzyk using CLAM.
+Output classes: ACC, BLCA, BRCA, CESC, CHOL, COAD, DLBC, ESCA, GBM, HNSC, KICH, KIRC, KIRP, LGG, LIHC, LUAD, LUSC, MESO, OV, PAAD, PCPG, PRAD, READ, SARC, SKCM, STAD, TGCT, THCA, THYM, UCEC, UCS, UVM.
+Please see the [TCGA study abbreviations](https://gdc.cancer.gov/resources-tcga-users/tcga-code-tables/tcga-study-abbreviations) to map these class names to the TCGA study names.
+## Data
+Diagnostic slides in TCGA (e.g., `DX`) were used to train the model. The whole slide images were tiles into 128x128um patches, and each patch was encoded using CTransPath (this produces 768-dimensional embeddings).
+Train, validation, and test splits were stratified by TCGA study, and patients did not cross split boundaries.
+Samples sizes:
+- Train: 9,257 slides (7,633 patients)
+- Validation: 1,186 slides (955 patients)
+- Test: 1,163 slides (955 patients)
+## Model performance
+The model achieved a weighted average AUROC of 0.99 (one-vs-rest).
+Here are the one-vs-rest AUROC values for each TCGA study.
+- ACC: 0.9993
+- BLCA: 0.9814
+- BRCA: 0.9908
+- CESC: 0.9868
+- CHOL: 0.9972
+- COAD: 0.9927
+- DLBC: 0.9996
+- ESCA: 0.9571
+- GBM: 0.9984
+- HNSC: 0.9974
+- KICH: 0.9998
+- KIRC: 0.9993
+- KIRP: 0.9952
+- LGG: 0.9984
+- LIHC: 0.9988
+- LUAD: 0.9879
+- LUSC: 0.9868
+- MESO: 0.9961
+- OV: 0.9900
+- PAAD: 0.9897
+- PCPG: 0.9944
+- PRAD: 1.0000
+- READ: 0.9752
+- SARC: 0.9946
+- SKCM: 0.9957
+- STAD: 0.9932
+- TGCT: 0.9957
+- THCA: 1.0000
+- THYM: 0.9991
+- UCEC: 0.9971
+- UCS: 0.9863
+- UVM: 0.9997
+### Renal cell carcinoma (RCC) subtyping
+RCC subtyping is a relatively common benchmark task for slide-level classification. We evaluate this model on RCC subtyping.
+When tested on a set of 52 KIRC slides and 28 KIRP slides (from the overall test set), the model achieved a balanced accuracy of 0.88.
+### Non-small cell lung cancer (NSCLC) subtyping
+NSCLC subtyping is a relatively common benchmark task for slide-level classification. We evaluate this model on NSCLC subtyping.
+When tested on a set of 55 LUAD slides and 58 LUSC slides (from the overall test set), the model achieved a balanced accuracy of 0.76.
+# Intended uses
+This model is ONLY intended for research purposes.
+**This model may not be used for clinical purposes.** This model is distributed without warranties, either express or implied.

config.json ADDED Viewed

	@@ -0,0 +1,41 @@

+{
+    "spec_version": "1.0",
+    "type": "clam",
+    "patch_size_um": 128,
+    "feature_extractor": "ctranspath",
+    "num_classes": 32,
+    "class_names": [
+        "ACC",
+        "BLCA",
+        "BRCA",
+        "CESC",
+        "CHOL",
+        "COAD",
+        "DLBC",
+        "ESCA",
+        "GBM",
+        "HNSC",
+        "KICH",
+        "KIRC",
+        "KIRP",
+        "LGG",
+        "LIHC",
+        "LUAD",
+        "LUSC",
+        "MESO",
+        "OV",
+        "PAAD",
+        "PCPG",
+        "PRAD",
+        "READ",
+        "SARC",
+        "SKCM",
+        "STAD",
+        "TGCT",
+        "THCA",
+        "THYM",
+        "UCEC",
+        "UCS",
+        "UVM"
+    ]
+}

model-with-instance-classifiers.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:8c15dcf4dd1d901acd0581850edae97d837fbc920c6f0592baebcb7c7aa2542e
+size 2830572

model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:8e0539cb88046f2b8515c149b525af8f05e505e5e81e59b5405ebdf07b64e4f4
+size 2693188

torchscript_model.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:1c9737b34ba3c1041de80e9b1f42096ebf11438acf0a80d542fdf8d9aed7ed98
+size 2711792