Commit d7e87dac authored by Carina Lansing's avatar Carina Lansing
Browse files

Updated the shifter and singularity instructions and documentation in the run.sh file.

parent 63d120a0
# LASSO-O Shifter Instructions
Follow these instructions if you will be running LASSO-O at NERSC.
Follow these instructions if you will be running LASSO-O at NERSC
or on an HPC cluster with a Shifter module available.
## Running LASSO-O via Shifter
Start LASSO-O via the `run.sh` script from the **run-lasso-o_shcu** folder:
## Setting up Shifter environment on your HPC cluster
If you are running on NERSC, shifter will be available in the
default environment. If you are running on an HPC
cluster other than NERSC, Shifter should be available via a module load.
To see if Shifter is available on your cluster, type the following:
```bash
$ srun -N 1 -n 1 ./run.sh
$ module avail
```
LASSO-O will take a while to run depending upon the number of simulations
you are processing.
The first time you run the lasso-o_shcu container, the container runtime
will download the container image, which will take a few minutes. Then,
LASSO-O will run a series of processes for each simulation provided as
input.
#### Load Shifter CLI on your HPC system
If you are not on NERSC,m use `module load` to load the Shifter
command-line client. For example:
```bash
$ module load shifter/3.7.1
```
Once in your compute node shell with the Shifter module loaded,
you can run LASSO-O using the following instructions.
## Running LASSO-O via Shifter
You can start LASSO-O by executing the `run.sh` script via a scheduler command.
Below we provide examples for the Slurm and PBS schedulers. (The Slurm
scheduler is used at NERSC.)
These commands instruct the scheduler to run LASSO-O with one node
and one cpu. Currently, the LASSO-O container executes each process
in sequence. Later we will add the ability to support multiple cores.
Until then, LASSO-O will take a while to run depending upon the number of simulations
you are processing. In addition, the first time you run the lasso-o_shcu container,
the container runtime will download the container image, which will also take a few minutes.
<div style="background-color: #F9F5D2; border: 1px solid grey; margin: 10px; padding: 10px;">
<strong>NOTE: </strong>
If you do not use a scheduler to invoke the script, it will run on the login node,
which may be killed per host policy if it runs for too long.
</div>
When your job has completed, you may view the outputs created in the
`run-lasso-o_shcu/data/outputs` folder using the notebooks provided.
See the [notebooks/README.md](notebooks/README.md) file for more
information.
#### Start via Slurm
* **Step 1: Run job**
```bash
$ srun --verbose --nodes=1 --ntasks=1 --cpus-per-task=1 ./run.sh
```
* **Step 2: Check job status**
To check the status of your job, list the queue for your user ID:
```bash
$ squeue -u [user_name]
```
You can also get more information about the running job using the scontrol command
and the jobID printed out from the squeue command:
```bash
$ scontrol show jobid -dd [jobID]
```
#### Start via PBS
PBS jobs must be started with a batch script:
* **Step 1: Edit batch script**
To run LASSO-O, first edit
the `pbs_sub.sh` file to use the appropriate parameters for your
environment. In particular, the account name, group_list, and QoS
parameters should be definitely be changed, but other parameters may also be adjusted
as needed. In addition, the `module load` commands should be adjusted
to load the shifter module that is available at your cluster.
* **Step 2: Submit job**
After you have edited the batch script, then you should be able to submit a batch job via:
```bash
$ qsub pbs_sub.sh -d .
```
* **Step 3: Check job status**
To check the status of your job, list the queue for your user ID:
```bash
$ qstat -u [user_name]
```
......@@ -52,41 +52,52 @@ information.
#### Start via Slurm
```bash
$ srun --verbose --nodes=1 --ntasks=1 --cpus-per-task=1 ./run.sh
```
* **Step 1: Run job**
##### Check job status
> To check the status of your job, list the queue for your user ID:
```bash
$ srun --verbose --nodes=1 --ntasks=1 --cpus-per-task=1 ./run.sh
```
```bash
$ squeue -u [user_name]
```
* **Step 2: Check job status**
> You can also get more information about the running job using the scontrol command
> and the jobID printed out from the squeue command:
To check the status of your job, list the queue for your user ID:
```bash
$ scontrol show jobid -dd [jobID]
```
```bash
$ squeue -u [user_name]
```
You can also get more information about the running job using the scontrol command
and the jobID printed out from the squeue command:
```bash
$ scontrol show jobid -dd [jobID]
```
#### Start via PBS
PBS jobs must be started with a batch script. To run LASSO-O, first edit
the `pbs_sub.sh` file to use the appropriate parameters for your
environment. In particular, the account name, group_list, and QoS
parameters should be changed, but other parameters may also be adjusted
as needed.
PBS jobs must be started with a batch script:
After you have edited the batch script, then you should be able to submit a batch job via:
* **Step 1: Edit batch script**
```bash
$ qsub pbs_sub.sh
```
To run LASSO-O, first edit
the `pbs_sub.sh` file to use the appropriate parameters for your
environment. In particular, the account name, group_list, and QoS
parameters should be definitely be changed, but other parameters may also be adjusted
as needed. In addition, the `module load` commands should be adjusted
to load the singularity module that is available at your cluster.
##### Check job status
> To check the status of your job, list the queue for your user ID:
* **Step 2: Submit job**
```bash
$ qstat -u [user_name]
```
After you have edited the batch script, then you should be able to submit a batch job via:
```bash
$ qsub pbs_sub.sh -d .
```
* **Step 3: Check job status**
To check the status of your job, list the queue for your user ID:
```bash
$ qstat -u [user_name]
```
......@@ -14,15 +14,15 @@ set -e
show_help() {
echo ""
echo -e "$GREEN--------------------------------------------------------------------------$NC"
echo -e "$GREEN This script helps you to run the LASSO-O container in either Docker $NC"
echo -e "$GREEN or Singularity environments. $NC"
echo -e "$GREEN This script helps you to run the LASSO-O container in either Docker, $NC"
echo -e "$GREEN Singularity, or Shifter environments. $NC"
echo -e "$GREEN--------------------------------------------------------------------------$NC"
echo ""
echo "SYNTAX: ./run.sh [-h]"
echo ""
echo "PREREQUISITES: "
echo " 1) Make sure your Docker or Singularity environments are "
echo " available. See README.md for more information on setting "
echo "PREREQUISITES: "
echo " 1) Make sure your Docker, Singularity, or Shifter environments "
echo " are available. See README.md for more information on setting"
echo " up your container runtime environment."
echo ""
echo " 2) Make sure to configure the config.yml file with the "
......@@ -101,22 +101,18 @@ run_singularity() {
}
run_shifter() {
# Note that NERSC id not have gitlab.com on their allowed registries, so they
# had to pull the image the first time in order to make this work. They
# are adding gitlab.com so that in the future, we can pull updates
# Note that it appears that shifter requires admins to pull the image
# into their cache in order to run. On NERSC, I had to submit a ticket
# for the image to be pulled, and it looks like we have to do the same thing on cumulus!
# Older versions of shifter don't support the --env parameter, so we are exporting this
# variable directly into the environment so it will be inherited by the shifter container.
export BEGIN_DATETIME=$begin_datetime
shifter \
--env=BEGIN_DATETIME=$begin_datetime \
--volume=$input_folder:/data/lasso/inputs \
--volume=$output_folder:/data/lasso/outputs \
--image=docker:registry.gitlab.com/gov-doe-arm/docker/lasso-o_shcu -- /apps/base/python3.6/bin/python /bin/run_lasso.py
shifter \
--env=BEGIN_DATETIME=20180710.115900 \
--volume=/global/cscratch1/sd/carinal/test-in:/data/lasso/inputs \
--volume=/global/cscratch1/sd/carinal/test-out:/data/lasso/outputs \
--image=docker:registry.gitlab.com/gov-doe-arm/docker/lasso-o_shcu -- /apps/base/python3.6/bin/python /bin/run_lasso.py
}
argument="$1"
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment