UMIACS Servers: Difference between revisions
Created page with "Notes on using UMIACS servers" |
|||
(59 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
Notes on using UMIACS servers | Notes on using UMIACS servers | ||
==Modules== | |||
Use modules to load programs you need to run.<br> | |||
;Notes | |||
* You can load modules in your <code>.bashrc</code> file | |||
<syntaxhighlight lang="bash"> | |||
# List loaded modules | |||
module list | |||
# Load a module | |||
module load [my_module] | |||
# List all available modules | |||
module avail | |||
</syntaxhighlight> | |||
Some useful modules in my <code>.bashrc</code> file | |||
<syntaxhighlight lang="bash"> | |||
module load tmux | |||
module load cuda/10.0.130 | |||
module load cudnn/v7.5.0 | |||
module load Python3/3.7.6 | |||
module load git | |||
</syntaxhighlight> | |||
==Python== | |||
Do not install anaconda in home. You will run out of space.<br> | |||
Load the Python 3 module adding the following to your .bashrc file | |||
<syntaxhighlight lang="bash"> | |||
module load Python3/3.7.6 | |||
export PATH="${PATH}:$(python3 -c 'import site; print(site.USER_BASE)')/bin" | |||
</syntaxhighlight> | |||
Then run the following to get pip installed | |||
<syntaxhighlight lang="bash"> | |||
curl https://bootstrap.pypa.io/get-pip.py -o get-pip.py | |||
python get-pip.py --user | |||
</syntaxhighlight> | |||
;Notes | |||
* You will need to install things with <code>pip --user</code> | |||
* You may need to add your local site-packages to your PYTHONPATH environment variable | |||
** Add this to .bashrc: | |||
** <code>export PYTHONPATH="${PYTHONPATH}:/nfshomes/$(whoami)/.local/lib/python3.7/site-packages/"</code> | |||
* You can also install using <code>pip install --target=/my-libs-folder/</code> | |||
===Conda=== | |||
If you must install conda, install it somewhere with a lot of space like scratch. | |||
===Install PyTorch=== | |||
<syntaxhighlight lang="bash"> | |||
pip install --user torch===1.3.1 torchvision===0.4.2 -f https://download.pytorch.org/whl/torch_stable.html | |||
</syntaxhighlight> | |||
===Installing Packages to a Directory=== | |||
<syntaxhighlight lang="bash"> | |||
pip install geographiclib -t /scratch1/davidli/python/ | |||
</syntaxhighlight> | |||
==MBRC Cluster== | |||
See [https://wiki.umiacs.umd.edu/umiacs/index.php/MBRC UMIACS MBRC]<br> | |||
===SLURM Job Management=== | |||
See [https://docs.rc.fas.harvard.edu/kb/convenient-slurm-commands/ https://docs.rc.fas.harvard.edu/kb/convenient-slurm-commands/]<br> | |||
; 1 GPU | |||
<pre> | |||
srun --pty --gres=gpu:1 --mem=16G --qos=high --time=47:59:00 -w mbrc00 bash | |||
</pre> | |||
; 2 GPUS mbrc00 | |||
<pre contenteditable="true"> | |||
srun --pty --gres=gpu:2 --mem=16G --qos=default --time=23:59:00 -w mbrc00 bash | |||
</pre> | |||
; CPU-only on scavenger QOS | |||
<pre> | |||
srun --pty --account=scavenger --partition=scavenger \ | |||
--time=3:59:00 \ | |||
--mem=1G -c1 -w mbrc00 bash | |||
</pre> | |||
;Notes | |||
* You can add <code>-w mbrc01</code> to pick mbrc01 | |||
* <code>-c 4</code> for 4 cores | |||
====See Jobs==== | |||
; See my own jobs | |||
<pre> | |||
squeue -u <user> -o "%8i %10P %8j %10u %10L %5b" | |||
</pre> | |||
; Formatting | |||
* <code>%L</code> is remaining time | |||
* <code>%b</code> is the number of GPUs | |||
; See all jobs | |||
<pre> | |||
squeue | |||
</pre> | |||
===SFTP=== | |||
Note: If you know of an easier way, please tell me. | |||
On your PC | |||
Start an sshd for forwarding. You can do this in a docker container for privacy purposes. | |||
On the cluster: | |||
Generate an sshd host key: | |||
<pre> | |||
ssh-keygen -t ed25519 -a 100 -f /nfshomes/dli7319/ssh/ssh_host_ed25519_key | |||
</pre> | |||
Create the following <code>sshd_config</code> file | |||
<pre> | |||
# $OpenBSD: sshd_config,v 1.103 2018/04/09 20:41:22 tj Exp $ | |||
Port 5981 | |||
HostKey /nfshomes/dli7319/ssh/ssh_host_ed25519_key | |||
AuthorizedKeysFile .ssh/authorized_keys | |||
Subsystem sftp /usr/libexec/openssh/sftp-server | |||
</pre> | |||
Start the sshd daemon and proxy the port to your local sshd. | |||
You can make a script like this: | |||
<pre> | |||
#!/bin/bash | |||
LOCAL_PORT=5981 | |||
REMOTE_PORT=22350 | |||
REMOTE_SSH_PORT=22450 | |||
REMOTE_ADDR=$(echo "$SSH_CONNECTION" | awk '{print $1}') | |||
/usr/sbin/sshd -D -f sshd_config & \ | |||
ssh -R $REMOTE_PORT:localhost:$LOCAL_PORT root@$REMOTE_ADDR -p $REMOTE_SSH_PORT | |||
</pre> | |||
On your PC: | |||
Proxy the sshd from the local docker to your localhost. | |||
Connect to the the sshd on the cluster | |||
==Class Accounts== | |||
See [https://wiki.umiacs.umd.edu/umiacs/index.php/ClassAccounts UMIACS Wiki: ClassAccounts] | |||
Class accounts have the least priority. If GPUs are available, you can access 1 GPU up to 48 hours. | |||
However, your home disk only has 18GB and installing PyTorch takes up ~3GB. | |||
You cannot fit a conda environment in here so just use the python module. | |||
The ssh endpoint is | |||
<pre> | |||
class.umiacs.umd.edu | |||
</pre> | |||
Start a job with: | |||
<pre> | |||
srun --pty --account=class --partition=class --gres=gpu:1 --mem=16G --qos=default --time=47:59:00 -c4 bash | |||
</pre> | |||
{{hidden | My .bashrc | | |||
<pre> | |||
#PS1='\w$ ' | |||
PS1='\[\e]0;\u@\h: \w\a\]${debian_chroot:+($debian_chroot)}\[\033[01;32m\]\u@\h\[\033[00m\]:\[\033[01;34m\]\w\[\033[00m\]\$' | |||
# Modules | |||
module load tmux | |||
module load cuda/10.0.130 | |||
module load cudnn/v7.5.0 | |||
module load Python3/3.7.6 | |||
alias python=python3 | |||
export PATH="${PATH}:${HOME}/bin/" | |||
export PATH="${PATH}:${HOME}/.local/bin/" | |||
</pre> | |||
}} | |||
==<code>.bashrc</code>== | |||
{{hidden | My .bashrc | | |||
<syntaxhighlight lang="bash"> | |||
#PS1='\w$ ' | |||
PS1='\[\e]0;\u@\h: \w\a\]${debian_chroot:+($debian_chroot)}\[\033[01;32m\]\u@\h\[\033[00m\]:\[\033[01;34m\]\w\[\033[00m\]\$' | |||
if test -f "/opt/rh/rh-php72/enable"; then | |||
source /opt/rh/rh-php72/enable | |||
fi | |||
export NVM_DIR="$HOME/.nvm" | |||
[ -s "$NVM_DIR/nvm.sh" ] && \. "$NVM_DIR/nvm.sh" # This loads nvm | |||
[ -s "$NVM_DIR/bash_completion" ] && \. "$NVM_DIR/bash_completion" # This loads nvm bash_completion | |||
command_exists() { | |||
type "$1" &> /dev/null ; | |||
} | |||
# Modules | |||
if command_exists module ; then | |||
module load tmux | |||
module load cuda/10.2.89 | |||
module load cudnn/v8.0.4 | |||
module load Python3/3.7.6 | |||
module load git/2.25.1 | |||
module load gitlfs | |||
module load gcc/8.1.0 | |||
module load openmpi/4.0.1 | |||
module load ffmpeg | |||
module load rclone | |||
fi | |||
if command_exists python3 ; then | |||
alias python=python3 | |||
fi | |||
if command_exists python3 ; then | |||
export PATH="${PATH}:$(python3 -c 'import site; print(site.USER_BASE)')/bin" | |||
fi | |||
export PYTHONPATH="${PYTHONPATH}:/nfshomes/dli7319/.local/lib/python3.7/site-packages/" | |||
export PATH="${HOME}/bin/:${PATH}" | |||
</syntaxhighlight> | |||
}} | |||
==Software== | |||
===git=== | |||
The MBRC cluster has an git available in the modules.<br> | |||
Then you can download [https://github.com/git-lfs/git-lfs/releases git-lfs compiled] and drop it in <code>~/bin/</code>.<br> | |||
Make sure <code>${HOME}/bin</code> is in your path and run <code>git lfs install</code><br> | |||
;Notes | |||
* Make sure you have a recent version of git | |||
** E.g. <code>module load git/2.25.1</code> | |||
==Copying Files== | |||
There are 3 ways that I use to copy files: | |||
* For small files, you can copy to your home directory under <code>/nfshomes/</code> via SFTP to the submission node. I rarely do this because the home directory is only a few gigs. | |||
* For large files and folder, I typically use [[rclone]] to copy to the cloud and then copy back to the scratch drives with a cpu-only job. | |||
** You can store project files on Google Drive or the UMIACS object storage. | |||
** Note that Google Drive has a limit on files per second and a daily limit of 750GB in transfers. |