File size: 5,730 Bytes
ab687e7 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 |
# pytorch-caney
Python package for lots of Pytorch tools.
[](https://zenodo.org/badge/latestdoi/472450059)



[](https://github.com/psf/black)
[](https://coveralls.io/github/nasa-nccs-hpda/pytorch-caney?branch=main)
## Documentation
- Latest: https://nasa-nccs-hpda.github.io/pytorch-caney/latest
## Objectives
- Library to process remote sensing imagery using GPU and CPU parallelization.
- Machine Learning and Deep Learning image classification and regression.
- Agnostic array and vector-like data structures.
- User interface environments via Notebooks for easy to use AI/ML projects.
- Example notebooks for quick AI/ML start with your own data.
## Installation
The following library is intended to be used to accelerate the development of data science products
for remote sensing satellite imagery, or any other applications. pytorch-caney can be installed
by itself, but instructions for installing the full environments are listed under the requirements
directory so projects, examples, and notebooks can be run.
Note: PIP installations do not include CUDA libraries for GPU support. Make sure NVIDIA libraries
are installed locally in the system if not using conda/mamba.
```bash
module load singularity # if a module needs to be loaded
singularity build --sandbox pytorch-caney-container docker://nasanccs/pytorch-caney:latest
```
## Why Caney?
"Caney" means longhouse in Taíno.
## Contributors
- Jordan Alexis Caraballo-Vega, [email protected]
- Caleb Spradlin, [email protected]
## Contributing
Please see our [guide for contributing to pytorch-caney](CONTRIBUTING.md).
## SatVision
| name | pretrain | resolution | #params |
| :---: | :---: | :---: | :---: |
| SatVision-B | MODIS-1.9-M | 192x192 | 84.5M |
## SatVision Datasets
| name | bands | resolution | #chips |
| :---: | :---: | :---: | :---: |
| MODIS-Small | 7 | 128x128 | 1,994,131 |
## MODIS Surface Reflectance (MOD09GA) Band Details
| Band Name | Bandwidth |
| :------------: | :-----------: |
| sur_refl_b01_1 | 0.620 - 0.670 |
| sur_refl_b02_1 | 0.841 - 0.876 |
| sur_refl_b03_1 | 0.459 - 0.479 |
| sur_refl_b04_1 | 0.545 - 0.565 |
| sur_refl_b05_1 | 1.230 - 1.250 |
| sur_refl_b06_1 | 1.628 - 1.652 |
| sur_refl_b07_1 | 2.105 - 2.155 |
## Pre-training with Masked Image Modeling
To pre-train the swinv2 base model with masked image modeling pre-training, run:
```bash
torchrun --nproc_per_node <NGPUS> pytorch-caney/pytorch_caney/pipelines/pretraining/mim.py --cfg <config-file> --dataset <dataset-name> --data-paths <path-to-data-subfolder-1> --batch-size <batch-size> --output <output-dir> --enable-amp
```
For example to run on a compute node with 4 GPUs and a batch size of 128 on the MODIS SatVision pre-training dataset with a base swinv2 model, run:
```bash
singularity shell --nv -B <mounts> /path/to/container/pytorch-caney-container
Singularity> export PYTHONPATH=$PWD:$PWD/pytorch-caney
Singularity> torchrun --nproc_per_node 4 pytorch-caney/pytorch_caney/pipelines/pretraining/mim.py --cfg pytorch-caney/examples/satvision/mim_pretrain_swinv2_satvision_base_192_window12_800ep.yaml --dataset MODIS --data-paths /explore/nobackup/projects/ilab/data/satvision/pretraining/training_* --batch-size 128 --output . --enable-amp
```
This example script runs the exact configuration used to make the SatVision-base model pre-training with MiM and the MODIS pre-training dataset.
```bash
singularity shell --nv -B <mounts> /path/to/container/pytorch-caney-container
Singularity> cd pytorch-caney/examples/satvision
Singularity> ./run_satvision_pretrain.sh
```
## Fine-tuning Satvision-base
To fine-tune the satvision-base pre-trained model, run:
```bash
torchrun --nproc_per_node <NGPUS> pytorch-caney/pytorch_caney/pipelines/finetuning/finetune.py --cfg <config-file> --pretrained <path-to-pretrained> --dataset <dataset-name> --data-paths <path-to-data-subfolder-1> --batch-size <batch-size> --output <output-dir> --enable-amp
```
See example config files pytorch-caney/examples/satvision/finetune_satvision_base_*.yaml to see how to structure your config file for fine-tuning.
## Testing
For unittests, run this bash command to run linting and unit test runs. This will execute unit tests and linting in a temporary venv environment only used for testing.
```bash
git clone [email protected]:nasa-nccs-hpda/pytorch-caney.git
cd pytorch-caney; bash test.sh
```
or run unit tests directly with container or anaconda env
```bash
git clone [email protected]:nasa-nccs-hpda/pytorch-caney.git
singularity build --sandbox pytorch-caney-container docker://nasanccs/pytorch-caney:latest
singularity shell --nv -B <mounts> /path/to/container/pytorch-caney-container
cd pytorch-caney; python -m unittest discover pytorch_caney/tests
```
```bash
git clone [email protected]:nasa-nccs-hpda/pytorch-caney.git
cd pytorch-caney; conda env create -f requirements/environment_gpu.yml;
conda activate pytorch-caney
python -m unittest discover pytorch_caney/tests
```
## References
- [Pytorch Lightning](https://github.com/Lightning-AI/lightning)
- [Swin Transformer](https://github.com/microsoft/Swin-Transformer)
- [SimMIM](https://github.com/microsoft/SimMIM)
|