FAQ
From Darwin
Welcome to the Darwin Compute Facilitiy FAQ page. Please feel free to edit this page, but please add questions and answers. Send your own questions to User Support.
Contents |
Available hardware
What hardware is available
A series of web pages describes the
- compute cluster hardware and software
- storage hardware and software
- display wall hardware and software
- networking hardware and software
Getting an account
When will I be able to get an account?
Our goal is to be ready for account requests by November 1. We will post a notice prominently on the Darwin compute facility home page announcing this.
Who is eligible for an account?
The Darwin compute facility is dedicated to research into marine ecosystems from cellular to global scales. The compute facility operates primarily to support the research activities of the [Darwin Project], and accounts are available to anyone affiliated with those activities.
How do I get an account?
Complete the online account request form at http://www.darwinproject.mit.edu/accounts.
Logging in
What is the name/address of the cluster?
The cluster is named beagle.darwinproject.mit.edu.
How do I log into the cluster?
The primary access mode for the compute cluster is secure login via ssh. Most (if not all) Macintosh, Linux, and Unix systems include a command-line ssh application. Simply open up a terminal or console and type ssh beagle.darwinproject.mit.edu -l your_login. Windows users can download SecureCRT from MIT IS&T.
How do I change my password?
To change your password use the passwd command on the head node. The change should propagate automatically to all the compute nodes after about one hour. However, you should not need to use a password to access the compute nodes.
How can I stop the system asking for a password to access each compute node?
see here or here to get you on the right track.
Policies
Is there a weekly maintenance period
Yes. Wednesday mornings are reserved for weekly maintenance and house cleaning. Not every Wednesday slot is used, but this period is reserved for maintenance and you should not expect the machine to be 100% available during this period. (Note - it may be down other times too, but hopefully not too often!).
Is there a limit to how much of the machine I can use
If you start swamping the system then other people trying to use the system will, and should feel free to, complain. If you have a very large set of work to do then please feel free to contact the user advisory group requesting some special access. As far as possible, and given some notice, people are happy to accommodate special circumstances, but just going for it, without checking first, and slowing everyone else down is not OK.
Running jobs
How do I run a job on a compute nodes?
The cluster is made up of 130 computers, only one of which (the master node) is connected to the outside world. To utilize the power of the cluster, most tasks should be performed on one of the compute nodes. A queue system, calle SGE, manages resource usage and all programs should be run through it. An example SGE job with MPI can be found here.
What are the nodes named?
The nodes are all aliased with a name of the form cX-Y where X is a rack number from 1 to 4 and Y is a node number from 0 to 31. Some example names are c1-15, c3-2, and c4-31.
Can I see which nodes are curently being used the least/most?
The ganglia page shows the current usage stats for each node. When SGE is operational, the qstat command will list the jobs being run on each node. Also see the cluster load section of Getting started with SGE.
Using the queuing system
How do I submit jobs to the queue?
Use the qsub or qrsh commands. The Getting started with SGE page has more details.
Compilers and other development tools
What development tools are available on the cluster?
The list of compilers and libraries installed can be found on the available software page.
Disk storage
How much disk space is there?
There is 500TB of raw disk space available. Like the computing power, it is distributed accross the nodes and managed using [GPFS], a file system developed by IBM for distributed computing. Trust us, it's pretty cool.
What storage is available?
The bulk of the 500TB is split into 3 drives for general use.
- /home (9TB) - User directories are here. This is the safest place to put your data. Data is duplicated on the cluster and will be backed up to tape.
- /data (44TB) - This drive is also mirrored, but not backed up.
- /scratch (~300TB) - This drive has some redundancy, but is not fully mirrored. It can recover from a single error, but a run of bad luck could destroy data. Use this for scratch files. At present only ~120TB of the /scratch storage is available, pending a firmware upgrade. We expect this upgrade to be available at the end of 2007.
Is anything backed up?
Not quite yet, but we have a tape drive and will be using it on /home once it's up and running.
How do I check my storage quota?
The GPFS command mmlsquota reports how much disk quota you have used up. To include mmlsquota in your search path load the gpfs module. The output from mmlsquota is, unfortunately, somewhat cryptic e.g.
[charles@beagle ~]$ module load gpfs
[charles@beagle ~]$ mmlsquota
Block Limits | File Limits
Filesystem type KB quota limit in_doubt grace | files quota limit in_doubt grace Remarks
gpfsFS1 USR no limits
gpfsFS12 USR no limits
gpfsFS2 USR 310449536 0 2097152000 0 none | 2 0 0 0 none
gpfsFS3 USR 395734 0 104857600 10560 none | 9761 0 0 32 none
gpfsFS5 USR no limits
Transferring data on and off
Use scp or sftp.
Specific applications
MITgcm
Where can I find examples of how to compile and run?
What datasets are available?
BLAST
What databases are available?
The NCBI databases (nr, nt, env_nr, and env_nt) and some GOS data are installed on the cluster. See Bioinformatics_tools_and_tips#BLAST the Bioinformatics tools page for details.
