Cluster Structure

Cluster Structure

There are three login nodes: cbsulogin, cbsulogin2, andcbsulogin3.
They are used for submitting jobs or requesting interactivesessions.
Use ssh to connect to login nodes (Cornell VPN is not necessary).- They all have the domain name cbsulogin.biohpc.cornell.edu.- E.g. ssh netid@cbsulogin.biohpc.cornell.edu
Don’t use them for computation.

Currently, there are a total of 18 computing nodes, which are namedcbsubscb01-cbsubscb17, and cbsubscbgpu01.
Their specs range from 32-64 in physical cores, and 256-1003GB inRAM.
You can access these computing nodes from the login node, bysubmitting a job using sbatch OR by requesting an interactivesession using salloc
Each node can also be accessed directly though ssh
You can’t run computationally intensive jobs in these sessions.- This is only for tasks such as job submission, interactive session request, file lookup, and monitoring.
Unlike the login nodes, accessing the computing nodes directly willrequire Cornell network connect (i.e. VPN if you are off campus).

Each computing node has a local permanent storage and each isassigned to a different lab group.
The computing node assigned to the Therkildsen lab for permanentstorage is cbsubscb16.
The local storage of each node is located at the directory/local/storage.
The local storage of each node can be mounted to any other nodeusing the command/programs/bin/labutils/mount_server node_name /storage.- e.g. /programs/bin/labutils/mount_server cbsubscb16 /storage
The mounted storage then becomes available under the directory/fs/node_name/storage/
This mounting step is usually one of the first things we do in a jobscript so that your input files can be accessed from any of thecomputing nodes.

Each computing node has a scratch storage, ranging from 1-2TB incapacity.
They are located under /workdir/ (and /SSD/ on some nodes).
It is recommended to copy your input files from network mountedstorage space to this scratch storage for computing, espcially forI/O heavy jobs.
The scratch storage is shared by all users, so after you finish yourjob, make sure to copy your output files to your permanent storagespace, and clear the scrach storage space.
Any files that were not removed by users will be removedautomatically when the node receives a new job.
Users who have jobs running on the nodes will not have theirassociated files removed from the scratch space.

There are three storage servers, which together have a capcity of281TB.
They should not be accessed directly.
Instead, they are network mounted in all BSCB machines under/bscb/, in which each lab group has a subfolder.
They are not mounted to the nt246 server yet.

You will always have the same home directory. It is mounted on all CBSU servers.
It has limited storage space and should not be used for computing orstorage.

Cluster partitions

Partition	Job Time Limit	Nodes	Slots
short	4 hours	cbsubscb[01-15], cbsubscbgpu01	1392
regular	24 hours	cbsubscb[01-15], cbsubscbgpu01	435
long7	7 days	cbsubscb[01-15]	437
long30	30 days	cbsubscb[01-15]	500 (limit 270 per user)
gpu	3 days	cbsubscbgpu01	32 + 2 GPUs

Any jobs submitted from the login node will get to a queue.
The cluster has five different queues (or partitions), and each hasa different time limit (see the table above).
The partition to which a job is submitted can be specified by the--partition or -p option, with default being the shortpartition.
Note that there are a total of 1392 slots (i.e. number of tasks thatcan be requested). So all slots are available for jobs submitted tothe short partition, and only a subset of them are available forother partitions.
Please use the following table to determine which partition(s) tosubmit your jobs to: