CLASSE Data Storage and Management
Approximately 100TB of direct-attached storage on Scientific Linux servers.
- All hard drives are SATA, SAS, or SCSI in hot-swappable front-loading carriers.
- All servers have redundant power supplies in addition to the RAID'ed disks.
- All servers use either software RAID or 3ware hardware RAID cards.
- Approximately 36 NFS servers. For a list of all file-systems, see NfsDirs
- All NFS file systems auto-mounted on Linux members of NIS domain using am-utils and autofs.
- 3 CIFS servers to make NFS file systems available to non-domain clients (Windows, OS X, and Linux)
Current Deployments - 10Gb iSCSI Storage Area Networks
Approximately 300TB in three 10Gb iSCSI SAN's constructed using commodity hardware, vendor-neutral technologies, and open-source software. The SAN's support three Linux clusters and a linux based Netbackup system. Each SAN consists of:
- Two IBM (Formerly Blade Networks) G8124 10Gb SFP+ Switches
- One or more 10Gb iSCSI devices (currently Infortrend ESDS S16E-R2240 - http://www.infortrend.com/us/products/models/ESDS%20S16E-R2240)
- redundant hot-swappable controllers, power-supplies, and fans
- Proprietary snapshotting and remote replication features available but not in use
- Each controller connected to both switches.
- Single RAID6 (with 1 hot spare) array built using all drives on each device
- Multiple IBM x3550 1U servers
- Scientific Linux 6 (RHEL6) with RedHat High Availability Add On (formerly known as "RedHat Cluster Suite")
- each with 2-port Emulex OCE10102-NX 10Gb SFP+ NIC, one connected to each switch in a bonded network configuration.
- iscsid, multipathd, and clustered LVM used with RAID6 array from each iSCSI device.
- Redundancy and failover configured to sustain simultaneous failure of a single switch, multiple cluster members, and a failure of one iSCSI controller (per device).
- Each Cluster / SAN can easily be upgraded with new or additional cluster members, iSCSI devices, or iSCSI expansion enclosures
- Each cluster used to serve high-availability services, file systems, and Linux and Windows KVM virtual machines.
CESR Control System Cluster
For its long-term requirements ("~5 more years"), CESR calculated that it would need 5TB of online data and 15TB of offline data.
- expect 4MB/s from each of seven BPM systems and 2MB/s from each of three MPM systems
Maximum of 2GB data files
overall, expect to take ~250GB of data per year.
Storage and Retrieval
All data is written to the 5TB CESR_ONLINE GFS2 file system which is served from the 9-member CESR Cluster. One cluster member runs NFS and CIFS service to provide non-cluster members access to GFS2 file system. Many control-system processes and infrastructure services also managed as high-availibility services in the CESR Cluster.
Every hour, all data is rsync'd from CESR_ONLINE to the 15TB CESR_OFFLINE (an ext4 file system served from the CLASSE Cluster). In addition, all data that has not been modified for one day is made read-only every night. Finally, all data is backed-up to tape nightly.
Metadata for data sets (run conditions, etc.) stored in MySQL database accessed and maintained through in-house developed frontends written in perl, phython, and Java and running in clustered GlassFish application servers.
Analysis, simulations, etc. are done from Linux, Windows, and Mac OS X using the CESR_OFFLINE file system. Linux analysis and simulation jobs are submitted to the CLASSE GridEngine
batch queuing system.
Our "Central Infrastructure" cluster is used to serve high-availability file systems and infrastructure services (DNS, DHCP, Apache, MySQL, PostgreSQL, GlassFish, NIS, NTP, etc.)
EXT4 file systems served from iSCSI devices to clients using high-availbility NFS and CIFS services.
Our "Test Cluster" is basically identical to the CESR and CLASSE Clusters. It is used for testing new services, debugging problems seen on the CLASSE and CESR clusters, and providing additional disaster recovery disk-based backups by periodic rsync's of file systems from CLASSE and CESR clusters.
Symantec Netbackup 7.5 running on two Scientific Linux 6 servers.
- two Infortrend iSCSI devices
- two direct-attached LTO5 tapes
- one IBM tape library
Backups done to ext4 file systems on iSCSI devices. Periodically flushed to tape and removed from disk.
New Projects and Requirements
Two types of data:
- (external) user data
- data from staff scientists' own experiments
Policies for data security and preservation are vague, but NSF requires at least the raw data to be kept (for how long?). Cornell supposedly has its own requirements for grad students.
See also https://confluence.cornell.edu/display/rdmsgweb/Home
Currently up to 360MB/s from time-resolved area detectors:
- 2 - 12MB per image (TIFF)
- collecting at 30 frames / second
In the future, with new undulators, x-ray flux (and, hence, data rates) could go up by factor of 10.
Storage and Retrieval
Currently, at least 1TB of local storage at each station for data generated by the time-resolved area detectors as described above (360MB/s).
- Data would be collected in burst mode, cached locally, and then pushed to intermediate hot storage ~daily.
Need intermediate hot storage to preserve a year's worth of data from all area detectors. At least 10TB for up to 1 year at current conditions. Potentially 5-10x that amount in the future.
- 1TB / week average.
- 10-15TB for up to a year. (100 TB to be safe?)
All data more than a year old can be pushed off to long-term storage (tape in robot or on a shelf, off-site bulk storage, etc.). Compress (in a cross-platform way) before archiving?
Need to record metadata for effective data retrieval, as well as cross-correlation with other datasets.
- Currently, metadata is only recorded as part of file name.
- Consider using something like FITS? (http://en.wikipedia.org/wiki/FITS) it's been used in the Astronomical community for decades.
Data analysis done on Windows and Linux. Within 5 years, may be running in-house data reduction on the fly (for example, possibly w/ Hadoop cluster).