Cluster Structure
Login nodes
- There are three login nodes:
cbsulogin,cbsulogin2, andcbsulogin3. - They are used for submitting jobs or requesting interactivesessions.
- Use
sshto connect to login nodes (Cornell VPN is not necessary).- They all have the domain namecbsulogin.biohpc.cornell.edu.- E.g.ssh netid@cbsulogin.biohpc.cornell.edu - Don’t use them for computation.
Computing nodes
General Info
- Currently, there are a total of 18 computing nodes, which are named
cbsubscb01-cbsubscb17, andcbsubscbgpu01. - Their specs range from 32-64 in physical cores, and 256-1003GB inRAM.
- You can access these computing nodes from the login node, bysubmitting a job using
sbatchOR by requesting an interactivesession usingsalloc - Each node can also be accessed directly though
ssh - You can’t run computationally intensive jobs in these sessions.- This is only for tasks such as job submission, interactive session request, file lookup, and monitoring.
- Unlike the login nodes, accessing the computing nodes directly willrequire Cornell network connect (i.e. VPN if you are off campus).
Local storage
- Each computing node has a local permanent storage and each isassigned to a different lab group.
- The computing node assigned to the Therkildsen lab for permanentstorage is
cbsubscb16. - The local storage of each node is located at the directory
/local/storage. - The local storage of each node can be mounted to any other nodeusing the command
/programs/bin/labutils/mount_server node_name /storage.- e.g./programs/bin/labutils/mount_server cbsubscb16 /storage - The mounted storage then becomes available under the directory
/fs/node_name/storage/ - This mounting step is usually one of the first things we do in a jobscript so that your input files can be accessed from any of thecomputing nodes.
Scratch storage
- Each computing node has a scratch storage, ranging from 1-2TB incapacity.
- They are located under
/workdir/(and/SSD/on some nodes). - It is recommended to copy your input files from network mountedstorage space to this scratch storage for computing, espcially forI/O heavy jobs.
- The scratch storage is shared by all users, so after you finish yourjob, make sure to copy your output files to your permanent storagespace, and clear the scrach storage space.
- Any files that were not removed by users will be removedautomatically when the node receives a new job.
- Users who have jobs running on the nodes will not have theirassociated files removed from the scratch space.
Storage servers
- There are three storage servers, which together have a capcity of281TB.
- They should not be accessed directly.
- Instead, they are network mounted in all BSCB machines under
/bscb/, in which each lab group has a subfolder. - They are not mounted to the
nt246server yet.
Home directory
- You will always have the same home directory. It is mounted on all CBSU servers.
- It has limited storage space and should not be used for computing orstorage.
Cluster partitions
| Partition | Job Time Limit | Nodes | Slots |
|---|---|---|---|
| short | 4 hours | cbsubscb[01-15], cbsubscbgpu01 | 1392 |
| regular | 24 hours | cbsubscb[01-15], cbsubscbgpu01 | 435 |
| long7 | 7 days | cbsubscb[01-15] | 437 |
| long30 | 30 days | cbsubscb[01-15] | 500 (limit 270 per user) |
| gpu | 3 days | cbsubscbgpu01 | 32 + 2 GPUs |
- Any jobs submitted from the login node will get to a queue.
- The cluster has five different queues (or partitions), and each hasa different time limit (see the table above).
- The partition to which a job is submitted can be specified by the
--partitionor-poption, with default being theshortpartition. - Note that there are a total of 1392 slots (i.e. number of tasks thatcan be requested). So all slots are available for jobs submitted tothe
shortpartition, and only a subset of them are available forother partitions. - Please use the following table to determine which partition(s) tosubmit your jobs to:
| Intended job duration | Partition specification | Slots available |
|---|---|---|
| up to 4 hours | --partition=short | 1392 |
| 4 - 24 hours | --partition=regular,long7,long30 | 1372 |
| 24 hours - 7 days | --partition=long7,long30 | 937 |
| 7 days - 30 days | --partition=long30 | 500 |
| GPU, up to 36 hours | --partiton=gpu --gres=gpu:tP100:1 | 32 |