UMIACS Servers: Difference between revisions
Line 231: | Line 231: | ||
==Copying Files== | ==Copying Files== | ||
There are 3 ways that I copy files | There are 3 ways that I use to copy files: | ||
* For small files, you can copy to your home directory under <code>/nfshomes/</code> via SFTP to | * For small files, you can copy to your home directory under <code>/nfshomes/</code> via SFTP to the submission node. I rarely do this because the home directory is only a few gigs. | ||
* For large files, I typically use [[rclone]] to copy to | * For large files and folder, I typically use [[rclone]] to copy to the cloud and then copy back to the scratch drives with a cpu-only job. | ||
** You can store project files on Google Drive or the UMIACS object storage. | |||
** Note that Google Drive has a limit on files per second and a daily limit of 750GB in transfers. |
Latest revision as of 15:23, 15 June 2023
Notes on using UMIACS servers
Modules
Use modules to load programs you need to run.
- Notes
- You can load modules in your
.bashrc
file
# List loaded modules
module list
# Load a module
module load [my_module]
# List all available modules
module avail
Some useful modules in my .bashrc
file
module load tmux
module load cuda/10.0.130
module load cudnn/v7.5.0
module load Python3/3.7.6
module load git
Python
Do not install anaconda in home. You will run out of space.
Load the Python 3 module adding the following to your .bashrc file
module load Python3/3.7.6
export PATH="${PATH}:$(python3 -c 'import site; print(site.USER_BASE)')/bin"
Then run the following to get pip installed
curl https://bootstrap.pypa.io/get-pip.py -o get-pip.py
python get-pip.py --user
- Notes
- You will need to install things with
pip --user
- You may need to add your local site-packages to your PYTHONPATH environment variable
- Add this to .bashrc:
export PYTHONPATH="${PYTHONPATH}:/nfshomes/$(whoami)/.local/lib/python3.7/site-packages/"
- You can also install using
pip install --target=/my-libs-folder/
Conda
If you must install conda, install it somewhere with a lot of space like scratch.
Install PyTorch
pip install --user torch===1.3.1 torchvision===0.4.2 -f https://download.pytorch.org/whl/torch_stable.html
Installing Packages to a Directory
pip install geographiclib -t /scratch1/davidli/python/
MBRC Cluster
See UMIACS MBRC
SLURM Job Management
See https://docs.rc.fas.harvard.edu/kb/convenient-slurm-commands/
- 1 GPU
srun --pty --gres=gpu:1 --mem=16G --qos=high --time=47:59:00 -w mbrc00 bash
- 2 GPUS mbrc00
srun --pty --gres=gpu:2 --mem=16G --qos=default --time=23:59:00 -w mbrc00 bash
- CPU-only on scavenger QOS
srun --pty --account=scavenger --partition=scavenger \ --time=3:59:00 \ --mem=1G -c1 -w mbrc00 bash
- Notes
- You can add
-w mbrc01
to pick mbrc01 -c 4
for 4 cores
See Jobs
- See my own jobs
squeue -u <user> -o "%8i %10P %8j %10u %10L %5b"
- Formatting
%L
is remaining time%b
is the number of GPUs
- See all jobs
squeue
SFTP
Note: If you know of an easier way, please tell me.
On your PC
Start an sshd for forwarding. You can do this in a docker container for privacy purposes.
On the cluster:
Generate an sshd host key:
ssh-keygen -t ed25519 -a 100 -f /nfshomes/dli7319/ssh/ssh_host_ed25519_key
Create the following sshd_config
file
# $OpenBSD: sshd_config,v 1.103 2018/04/09 20:41:22 tj Exp $ Port 5981 HostKey /nfshomes/dli7319/ssh/ssh_host_ed25519_key AuthorizedKeysFile .ssh/authorized_keys Subsystem sftp /usr/libexec/openssh/sftp-server
Start the sshd daemon and proxy the port to your local sshd. You can make a script like this:
#!/bin/bash LOCAL_PORT=5981 REMOTE_PORT=22350 REMOTE_SSH_PORT=22450 REMOTE_ADDR=$(echo "$SSH_CONNECTION" | awk '{print $1}') /usr/sbin/sshd -D -f sshd_config & \ ssh -R $REMOTE_PORT:localhost:$LOCAL_PORT root@$REMOTE_ADDR -p $REMOTE_SSH_PORT
On your PC:
Proxy the sshd from the local docker to your localhost.
Connect to the the sshd on the cluster
Class Accounts
See UMIACS Wiki: ClassAccounts
Class accounts have the least priority. If GPUs are available, you can access 1 GPU up to 48 hours.
However, your home disk only has 18GB and installing PyTorch takes up ~3GB.
You cannot fit a conda environment in here so just use the python module.
The ssh endpoint is
class.umiacs.umd.edu
Start a job with:
srun --pty --account=class --partition=class --gres=gpu:1 --mem=16G --qos=default --time=47:59:00 -c4 bash
#PS1='\w$ ' PS1='\[\e]0;\u@\h: \w\a\]${debian_chroot:+($debian_chroot)}\[\033[01;32m\]\u@\h\[\033[00m\]:\[\033[01;34m\]\w\[\033[00m\]\$' # Modules module load tmux module load cuda/10.0.130 module load cudnn/v7.5.0 module load Python3/3.7.6 alias python=python3 export PATH="${PATH}:${HOME}/bin/" export PATH="${PATH}:${HOME}/.local/bin/"
.bashrc
#PS1='\w$ '
PS1='\[\e]0;\u@\h: \w\a\]${debian_chroot:+($debian_chroot)}\[\033[01;32m\]\u@\h\[\033[00m\]:\[\033[01;34m\]\w\[\033[00m\]\$'
if test -f "/opt/rh/rh-php72/enable"; then
source /opt/rh/rh-php72/enable
fi
export NVM_DIR="$HOME/.nvm"
[ -s "$NVM_DIR/nvm.sh" ] && \. "$NVM_DIR/nvm.sh" # This loads nvm
[ -s "$NVM_DIR/bash_completion" ] && \. "$NVM_DIR/bash_completion" # This loads nvm bash_completion
command_exists() {
type "$1" &> /dev/null ;
}
# Modules
if command_exists module ; then
module load tmux
module load cuda/10.2.89
module load cudnn/v8.0.4
module load Python3/3.7.6
module load git/2.25.1
module load gitlfs
module load gcc/8.1.0
module load openmpi/4.0.1
module load ffmpeg
module load rclone
fi
if command_exists python3 ; then
alias python=python3
fi
if command_exists python3 ; then
export PATH="${PATH}:$(python3 -c 'import site; print(site.USER_BASE)')/bin"
fi
export PYTHONPATH="${PYTHONPATH}:/nfshomes/dli7319/.local/lib/python3.7/site-packages/"
export PATH="${HOME}/bin/:${PATH}"
Software
git
The MBRC cluster has an git available in the modules.
Then you can download git-lfs compiled and drop it in ~/bin/
.
Make sure ${HOME}/bin
is in your path and run git lfs install
- Notes
- Make sure you have a recent version of git
- E.g.
module load git/2.25.1
- E.g.
Copying Files
There are 3 ways that I use to copy files:
- For small files, you can copy to your home directory under
/nfshomes/
via SFTP to the submission node. I rarely do this because the home directory is only a few gigs. - For large files and folder, I typically use rclone to copy to the cloud and then copy back to the scratch drives with a cpu-only job.
- You can store project files on Google Drive or the UMIACS object storage.
- Note that Google Drive has a limit on files per second and a daily limit of 750GB in transfers.