UMIACS Servers: Difference between revisions

Latest revision as of 15:23, 15 June 2023

Notes on using UMIACS servers

Modules

Use modules to load programs you need to run.

Notes

You can load modules in your .bashrc file

# List loaded modules
module list

# Load a module
module load [my_module]

# List all available modules
module avail

Some useful modules in my .bashrc file

module load tmux
module load cuda/10.0.130
module load cudnn/v7.5.0
module load Python3/3.7.6
module load git

Python

Do not install anaconda in home. You will run out of space.
Load the Python 3 module adding the following to your .bashrc file

module load Python3/3.7.6
export PATH="${PATH}:$(python3 -c 'import site; print(site.USER_BASE)')/bin"

Then run the following to get pip installed

curl https://bootstrap.pypa.io/get-pip.py -o get-pip.py
python get-pip.py --user

Notes

You will need to install things with pip --user
You may need to add your local site-packages to your PYTHONPATH environment variable
- Add this to .bashrc:
- export PYTHONPATH="${PYTHONPATH}:/nfshomes/$(whoami)/.local/lib/python3.7/site-packages/"
You can also install using pip install --target=/my-libs-folder/

Conda

If you must install conda, install it somewhere with a lot of space like scratch.

Install PyTorch

pip install --user torch===1.3.1 torchvision===0.4.2 -f https://download.pytorch.org/whl/torch_stable.html

Installing Packages to a Directory

pip install geographiclib -t /scratch1/davidli/python/

MBRC Cluster

See UMIACS MBRC

SLURM Job Management

See https://docs.rc.fas.harvard.edu/kb/convenient-slurm-commands/

1 GPU

srun --pty --gres=gpu:1 --mem=16G --qos=high --time=47:59:00 -w mbrc00 bash

2 GPUS mbrc00

srun --pty --gres=gpu:2 --mem=16G --qos=default --time=23:59:00 -w mbrc00 bash

CPU-only on scavenger QOS

srun --pty --account=scavenger --partition=scavenger \
     --time=3:59:00 \
     --mem=1G -c1 -w mbrc00 bash

Notes

You can add -w mbrc01 to pick mbrc01
-c 4 for 4 cores

See Jobs

See my own jobs

squeue -u <user> -o "%8i %10P %8j %10u %10L %5b"

Formatting

%L is remaining time
%b is the number of GPUs

See all jobs

squeue

SFTP

Note: If you know of an easier way, please tell me.

On your PC
Start an sshd for forwarding. You can do this in a docker container for privacy purposes.

On the cluster:
Generate an sshd host key:

ssh-keygen -t ed25519 -a 100 -f /nfshomes/dli7319/ssh/ssh_host_ed25519_key

Create the following sshd_config file

#	$OpenBSD: sshd_config,v 1.103 2018/04/09 20:41:22 tj Exp $
Port 5981
HostKey /nfshomes/dli7319/ssh/ssh_host_ed25519_key
AuthorizedKeysFile	.ssh/authorized_keys
Subsystem	sftp	/usr/libexec/openssh/sftp-server

Start the sshd daemon and proxy the port to your local sshd. You can make a script like this:

#!/bin/bash

LOCAL_PORT=5981
REMOTE_PORT=22350
REMOTE_SSH_PORT=22450
REMOTE_ADDR=$(echo "$SSH_CONNECTION" | awk '{print $1}')

/usr/sbin/sshd -D -f sshd_config & \
ssh -R $REMOTE_PORT:localhost:$LOCAL_PORT root@$REMOTE_ADDR -p $REMOTE_SSH_PORT

On your PC:
Proxy the sshd from the local docker to your localhost.
Connect to the the sshd on the cluster

Class Accounts

See UMIACS Wiki: ClassAccounts

Class accounts have the least priority. If GPUs are available, you can access 1 GPU up to 48 hours.
However, your home disk only has 18GB and installing PyTorch takes up ~3GB.
You cannot fit a conda environment in here so just use the python module.

The ssh endpoint is

class.umiacs.umd.edu

Start a job with:

srun --pty --account=class --partition=class --gres=gpu:1 --mem=16G --qos=default --time=47:59:00 -c4 bash

My .bashrc

#PS1='\w$ '
PS1='\[\e]0;\u@\h: \w\a\]${debian_chroot:+($debian_chroot)}\[\033[01;32m\]\u@\h\[\033[00m\]:\[\033[01;34m\]\w\[\033[00m\]\$'

# Modules
module load tmux
module load cuda/10.0.130
module load cudnn/v7.5.0
module load Python3/3.7.6
alias python=python3

export PATH="${PATH}:${HOME}/bin/"
export PATH="${PATH}:${HOME}/.local/bin/"

`.bashrc`

My .bashrc

#PS1='\w$ '
PS1='\[\e]0;\u@\h: \w\a\]${debian_chroot:+($debian_chroot)}\[\033[01;32m\]\u@\h\[\033[00m\]:\[\033[01;34m\]\w\[\033[00m\]\$'

if test -f "/opt/rh/rh-php72/enable"; then
    source /opt/rh/rh-php72/enable
fi

export NVM_DIR="$HOME/.nvm"
[ -s "$NVM_DIR/nvm.sh" ] && \. "$NVM_DIR/nvm.sh"  # This loads nvm
[ -s "$NVM_DIR/bash_completion" ] && \. "$NVM_DIR/bash_completion"  # This loads nvm bash_completion

command_exists() {
  type "$1" &> /dev/null ;
}


# Modules
if command_exists module ; then
  module load tmux
  module load cuda/10.2.89
  module load cudnn/v8.0.4
  module load Python3/3.7.6
  module load git/2.25.1
  module load gitlfs
  module load gcc/8.1.0
  module load openmpi/4.0.1
  module load ffmpeg
  module load rclone
fi
if command_exists python3 ; then
  alias python=python3
fi

if command_exists python3 ; then
  export PATH="${PATH}:$(python3 -c 'import site; print(site.USER_BASE)')/bin"
fi
export PYTHONPATH="${PYTHONPATH}:/nfshomes/dli7319/.local/lib/python3.7/site-packages/"

export PATH="${HOME}/bin/:${PATH}"

Software

git

The MBRC cluster has an git available in the modules.
Then you can download git-lfs compiled and drop it in ~/bin/.
Make sure ${HOME}/bin is in your path and run git lfs install

Notes

Make sure you have a recent version of git
- E.g. module load git/2.25.1

Copying Files

There are 3 ways that I use to copy files:

For small files, you can copy to your home directory under /nfshomes/ via SFTP to the submission node. I rarely do this because the home directory is only a few gigs.
For large files and folder, I typically use rclone to copy to the cloud and then copy back to the scratch drives with a cpu-only job.
- You can store project files on Google Drive or the UMIACS object storage.
- Note that Google Drive has a limit on files per second and a daily limit of 750GB in transfers.

@@ Line 29: / Line 29: @@
 ==Python==
-Do not install anaconda. You will run out of space.<br>
+Do not install anaconda in home. You will run out of space.<br>
 Load the Python 3 module adding the following to your .bashrc file
 <syntaxhighlight lang="bash">
@@ Line 48: / Line 48: @@
 ** <code>export PYTHONPATH="${PYTHONPATH}:/nfshomes/$(whoami)/.local/lib/python3.7/site-packages/"</code>
 * You can also install using <code>pip install --target=/my-libs-folder/</code>
+===Conda===
+If you must install conda, install it somewhere with a lot of space like scratch.
 ===Install PyTorch===
 <syntaxhighlight lang="bash">
 pip install --user torch===1.3.1 torchvision===0.4.2 -f https://download.pytorch.org/whl/torch_stable.html
+</syntaxhighlight>
+===Installing Packages to a Directory===
+<syntaxhighlight lang="bash">
+pip install geographiclib -t /scratch1/davidli/python/
 </syntaxhighlight>
@@ Line 59: / Line 67: @@
 ===SLURM Job Management===
 See [https://docs.rc.fas.harvard.edu/kb/convenient-slurm-commands/ https://docs.rc.fas.harvard.edu/kb/convenient-slurm-commands/]<br>
+; 1 GPU
+<pre>
+srun --pty --gres=gpu:1 --mem=16G --qos=high --time=47:59:00 -w mbrc00 bash
+</pre>
+; 2 GPUS mbrc00
+<pre contenteditable="true">
+srun --pty --gres=gpu:2 --mem=16G --qos=default --time=23:59:00 -w mbrc00 bash
+</pre>
+; CPU-only on scavenger QOS
+<pre>
+srun --pty --account=scavenger --partition=scavenger \
+     --time=3:59:00 \
+     --mem=1G -c1 -w mbrc00 bash
+</pre>
+;Notes
+* You can add <code>-w mbrc01</code> to pick mbrc01
+* <code>-c 4</code> for 4 cores
+====See Jobs====
+; See my own jobs
 <pre>
-srun --pty --gres=gpu:1 --mem=16G --qos=default --time=04:00:00 bash
+squeue -u <user> -o "%8i %10P %8j %10u %10L %5b"
 </pre>
+; Formatting
+* <code>%L</code> is remaining time
+* <code>%b</code> is the number of GPUs
+; See all jobs
+<pre>
+squeue
+</pre>
+===SFTP===
+Note: If you know of an easier way, please tell me.
+On your PC
+Start an sshd for forwarding. You can do this in a docker container for privacy purposes.
+On the cluster:
+Generate an sshd host key:
+<pre>
+ssh-keygen -t ed25519 -a 100 -f /nfshomes/dli7319/ssh/ssh_host_ed25519_key
+</pre>
+Create the following <code>sshd_config</code> file
+<pre>
+#	$OpenBSD: sshd_config,v 1.103 2018/04/09 20:41:22 tj Exp $
+Port 5981
+HostKey /nfshomes/dli7319/ssh/ssh_host_ed25519_key
+AuthorizedKeysFile	.ssh/authorized_keys
+Subsystem	sftp	/usr/libexec/openssh/sftp-server
+</pre>
+Start the sshd daemon and proxy the port to your local sshd.
+You can make a script like this:
+<pre>
+#!/bin/bash
+LOCAL_PORT=5981
+REMOTE_PORT=22350
+REMOTE_SSH_PORT=22450
+REMOTE_ADDR=$(echo "$SSH_CONNECTION" | awk '{print $1}')
+/usr/sbin/sshd -D -f sshd_config & \
+ssh -R $REMOTE_PORT:localhost:$LOCAL_PORT root@$REMOTE_ADDR -p $REMOTE_SSH_PORT
+</pre>
+On your PC:
+Proxy the sshd from the local docker to your localhost.
+Connect to the the sshd on the cluster
+==Class Accounts==
+See [https://wiki.umiacs.umd.edu/umiacs/index.php/ClassAccounts UMIACS Wiki: ClassAccounts]
+Class accounts have the least priority. If GPUs are available, you can access 1 GPU up to 48 hours.
+However, your home disk only has 18GB and installing PyTorch takes up ~3GB.
+You cannot fit a conda environment in here so just use the python module.
+The ssh endpoint is
+<pre>
+class.umiacs.umd.edu
+</pre>
+Start a job with:
+<pre>
+srun --pty --account=class --partition=class --gres=gpu:1 --mem=16G --qos=default --time=47:59:00 -c4 bash
+</pre>
+{{hidden | My .bashrc |
+<pre>
+#PS1='\w$ '
+PS1='\[\e]0;\u@\h: \w\a\]${debian_chroot:+($debian_chroot)}\[\033[01;32m\]\u@\h\[\033[00m\]:\[\033[01;34m\]\w\[\033[00m\]\$'
+# Modules
+module load tmux
+module load cuda/10.0.130
+module load cudnn/v7.5.0
+module load Python3/3.7.6
+alias python=python3
+export PATH="${PATH}:${HOME}/bin/"
+export PATH="${PATH}:${HOME}/.local/bin/"
+</pre>
+}}
 ==<code>.bashrc</code>==
@@ Line 85: / Line 197: @@
 if command_exists module ; then
    module load tmux
-   module load cuda/10.0.130
+   module load cuda/10.2.89
-   module load cudnn/v7.5.0
+   module load cudnn/v8.0.4
    module load Python3/3.7.6
    module load git/2.25.1
+  module load gitlfs
+  module load gcc/8.1.0
+  module load openmpi/4.0.1
+  module load ffmpeg
+  module load rclone
 fi
 if command_exists python3 ; then
@@ Line 112: / Line 229: @@
 * Make sure you have a recent version of git
 ** E.g. <code>module load git/2.25.1</code>
+==Copying Files==
+There are 3 ways that I use to copy files:
+* For small files, you can copy to your home directory under <code>/nfshomes/</code> via SFTP to the submission node. I rarely do this because the home directory is only a few gigs.
+* For large files and folder, I typically use [[rclone]] to copy to the cloud and then copy back to the scratch drives with a cpu-only job.
+** You can store project files on Google Drive or the UMIACS object storage.
+** Note that Google Drive has a limit on files per second and a daily limit of 750GB in transfers.