UMIACS Servers: Difference between revisions

 
(8 intermediate revisions by the same user not shown)
Line 29: Line 29:


==Python==
==Python==
Do not install anaconda. You will run out of space.<br>
Do not install anaconda in home. You will run out of space.<br>
Load the Python 3 module adding the following to your .bashrc file
Load the Python 3 module adding the following to your .bashrc file
<syntaxhighlight lang="bash">
<syntaxhighlight lang="bash">
Line 48: Line 48:
** <code>export PYTHONPATH="${PYTHONPATH}:/nfshomes/$(whoami)/.local/lib/python3.7/site-packages/"</code>
** <code>export PYTHONPATH="${PYTHONPATH}:/nfshomes/$(whoami)/.local/lib/python3.7/site-packages/"</code>
* You can also install using <code>pip install --target=/my-libs-folder/</code>
* You can also install using <code>pip install --target=/my-libs-folder/</code>
===Conda===
If you must install conda, install it somewhere with a lot of space like scratch.


===Install PyTorch===
===Install PyTorch===
Line 75: Line 78:
</pre>
</pre>


; CPU-only on default QOS
; CPU-only on scavenger QOS
<pre>
<pre>
srun --pty --mem=16G --qos=default --time=23:59:00 -w mbrc00 bash
srun --pty --account=scavenger --partition=scavenger \
    --time=3:59:00 \
    --mem=1G -c1 -w mbrc00 bash
</pre>
</pre>


Line 117: Line 122:
Subsystem sftp /usr/libexec/openssh/sftp-server
Subsystem sftp /usr/libexec/openssh/sftp-server
</pre>
</pre>
Start the sshd daemon and proxy the port to your local sshd
Start the sshd daemon and proxy the port to your local sshd.
You can make a script like this:
<pre>
#!/bin/bash
 
LOCAL_PORT=5981
REMOTE_PORT=22350
REMOTE_SSH_PORT=22450
REMOTE_ADDR=$(echo "$SSH_CONNECTION" | awk '{print $1}')
 
/usr/sbin/sshd -D -f sshd_config & \
ssh -R $REMOTE_PORT:localhost:$LOCAL_PORT root@$REMOTE_ADDR -p $REMOTE_SSH_PORT
</pre>


On your PC:   
On your PC:   
Proxy the sshd from the local docker to your localhost.   
Proxy the sshd from the local docker to your localhost.   
Connect to the the sshd on the cluster
Connect to the the sshd on the cluster
==Class Accounts==
See [https://wiki.umiacs.umd.edu/umiacs/index.php/ClassAccounts UMIACS Wiki: ClassAccounts] 
Class accounts have the least priority. If GPUs are available, you can access 1 GPU up to 48 hours. 
However, your home disk only has 18GB and installing PyTorch takes up ~3GB. 
You cannot fit a conda environment in here so just use the python module.
The ssh endpoint is
<pre>
class.umiacs.umd.edu
</pre>
Start a job with:
<pre>
srun --pty --account=class --partition=class --gres=gpu:1 --mem=16G --qos=default --time=47:59:00 -c4 bash
</pre>
{{hidden | My .bashrc |
<pre>
#PS1='\w$ '
PS1='\[\e]0;\u@\h: \w\a\]${debian_chroot:+($debian_chroot)}\[\033[01;32m\]\u@\h\[\033[00m\]:\[\033[01;34m\]\w\[\033[00m\]\$'
# Modules
module load tmux
module load cuda/10.0.130
module load cudnn/v7.5.0
module load Python3/3.7.6
alias python=python3
export PATH="${PATH}:${HOME}/bin/"
export PATH="${PATH}:${HOME}/.local/bin/"
</pre>
}}


==<code>.bashrc</code>==
==<code>.bashrc</code>==
Line 145: Line 197:
if command_exists module ; then
if command_exists module ; then
   module load tmux
   module load tmux
   module load cuda/10.1.243
   module load cuda/10.2.89
   module load cudnn/v7.6.5
   module load cudnn/v8.0.4
   module load Python3/3.7.6
   module load Python3/3.7.6
   module load git/2.25.1
   module load git/2.25.1
   module load gitlfs
   module load gitlfs
   module load gcc/8.1.0
   module load gcc/8.1.0
   #module load gcc/6.3.0
   module load openmpi/4.0.1
   module load ffmpeg
   module load ffmpeg
 
  module load rclone
fi
fi
if command_exists python3 ; then
if command_exists python3 ; then
Line 177: Line 229:
* Make sure you have a recent version of git
* Make sure you have a recent version of git
** E.g. <code>module load git/2.25.1</code>
** E.g. <code>module load git/2.25.1</code>
==Copying Files==
There are 3 ways that I use to copy files:
* For small files, you can copy to your home directory under <code>/nfshomes/</code> via SFTP to the submission node. I rarely do this because the home directory is only a few gigs.
* For large files and folder, I typically use [[rclone]] to copy to the cloud and then copy back to the scratch drives with a cpu-only job.
** You can store project files on Google Drive or the UMIACS object storage.
** Note that Google Drive has a limit on files per second and a daily limit of 750GB in transfers.