Cyberinfrastructure Technology Integration

Clemson Computing and Information Technology (CCIT) provides research cyberinfrastructure resources and advanced research computing capabilities through its Cyberinfrastructure Technology Integration (CITI) group.

Clemson Research Cyberinfrastructure

Palmetto Cluster

Palmetto Cluster is a local high performance computing environment available to all Clemson students/faculty and staff as a dedicated research environment.

Overview

  • available to all Clemson students/faculty/staff for free
  • 2021 compute nodes, 23072 cores
  • heterogeneous configuration with various types of nodes (different CPU, memory, network, disk space)
  • 386 nodes are equipped with NVIDIA Tesla GPUs: 280 nodes with NVIDIA K20 GPUs (2 per node), 106 nodes with NVIDIA K40 GPUs (2 per node)
  • 4 nodes with Intel Phi co-processors (2 per node)
  • 6 large memory nodes (5 with 505GB, 1 with 2TB), 262 nodes with 128GB of memory
  • 100GB of personal space (backed up daily for 42 days)
  • Myrinet, 10Gbps Ethernet, Infiniband networks
  • global and local scratch spaces for temporary files (no quota per user)
  • maximum run time for a single task limited to 72 hours on Infiniband part and 168 hours on Myrinet part
  • ranked 4th among the public academic institutions in the US on Top500 list (155 on Top500) with performance of 814.4 TFlops (17,372 cores from Infiniband part of Palmetto)

More about Palmetto
New account on Palmetto
Reservation request

Condominium model

Palmetto cluster operates in a condominium model which allows faculty to invest in the cluster. Investments into Palmetto are based on purchases of compute nodes. By purchasing a compute node faculty get priority to use an equivalent hardware across whole Palmetto cluster. All not used compute cycles are made available for general Clemson users. Owners may preempt other users making the hardware they purchased immediately available. Purchased nodes are available to faculty for a period of 4 years, after that the priority to use them expires.

Being an owner allows users to

  • have immediate access to the amount they have purchased by preempting other users
  • have a dedicated group on Palmetto cluster
  • invite external collaborators (not associated with Clemson) to use their purchased resources
  • have extended maximum time for a single task up to 336 hours (14 days)

For more information about condominium model and purchasing Palmetto nodes, including Palmetto nodes on grants please contact Jeronica Williams jeronic@clemson.edu or Marcin Ziolkowski zziolko@clemson.edu.

Owners guide for Palmetto cluster

Temporary storage

Palmetto includes several file systems designed for storing temporary files

  1. Local disk on compute nodes
  2. Following global scratch systems
File system Directory Capacity Features
OrangeFS /scratch1 233 TB - distributed file system based on OrangeFS
- available to all compute nodes and the login node
- no quota per user
- files not accessed for 30 days deleted on first day of each month
- designed for parallel I/O
ZFS /scratch2 150 TB - single server sharing space to all compute nodes and the login node
- no quota per user
- files not accessed for 30 days deleted daily
- designed for general I/O patterns (small and/or single process I/O)
XFS /scratch3 129 TB - single server sharing space to all compute nodes and the login node
- no quota per user
- files not accessed for 30 days deleted daily
- designed for general I/O patterns (small and/or single process I/O)

Secure data server

CCIT hosts server dedicated to preparing sensitive data for further use on either Palmetto or Cypress clusters. This system is meant to be used for de-identification of any Protected Health Information (HIPAA) for further analysis. The following identifiers combined with medical information qualify any data as sensitive

  • Names
  • Telephone numbers
  • Fax numbers
  • Email addresses
  • Social Security numbers
  • Medical record numbers
  • Health plan numbers
  • License plate numbers
  • URLs
  • Full-face photographic images
  • Any other unique identifying marker that allows identification of an individual

The system dedicated to the initial processing of the sensitive information is equipped with tools enhancing its security and the processes (access, monitoring, audit, administration) involving this system are detached from the general once for Palmetto or Cypress.

Security measures for the PHI research system

  • Secure location with 24/7 monitoring of access
  • Access to the system is limitted to restricted list of users
  • Access to restricted list of IP addresses controlled both at network and firewall levels
  • Secure shell only access
  • Two-factor authentication
  • Monitoring of all commands on the system
  • Encrypted file system
  • Periodic security audit

For access to the secured research system please contact Advanced Computing and Data Science group.

HIPAA


Long term storage

Long term storage solutions are available to users seeking a dedicated high performance storage. This service is provided for fee to Clemson users. Palmetto users may purchase ZFS storage with either dedicated server (purchasing of 150TB) or shared server (purchase in 1TB increments).

Long term storage space includes snapshots of changes and mirror system for disaster recovery.

More information about long term storage


Cypress Cluster

Clemson cyberinfrastructure includes a dedicated Hadoop environment, called Cypress, that is integrated with Palmetto’s infrastructure. The Cypress Cluster uses the Hortonworks Data Platform distribution of Hadoop and Spark to support data intensive computing and analytics. Cypress is available to all students, faculty, and staff with Palmetto Cluster accounts.

Overview

  • available for free to all Clemson students, faculty, and staff
  • 3.64 PB (petabyte) global Hadoop Distributed File System (HDFS)
  • 40 worker nodes (responsible for computation and data storage)
    • 256 GB of RAM per node
    • 16 nodes each have 12 1-TB local disks
    • 24 nodes each have 24 6-TB local disks
  • one dedicated Cypress Cluster user node for job submission and data staging
  • Hortonworks Data Platform distribution of Hadoop, Spark, and other Hadoop ecosystem services

For more information about investing into Cypress Cluster please contact Jeronica Williams jeronic@clemson.edu or Linh Ngo lngo@clemson.edu.

More about Cypress


Support

The Palmetto and Cypress Clusters research support is provided by Advanced Computing and Data Science group:


GalaxyGIS Cluster

Clemson Center for Geospatial Technologies (CCGT) cyberinfrastructure includes a High Throughput Computing pool, called GalaxyGIS, to address the needs of desktop GIS users who needs additional computational power for their GIS analysis. The GIS Cluster consists of over 30 Windows computers with installed GIS programs and a scheduler to distribute GIS jobs for parallel processing through available nodes. GalaxyGIS is available to all students, faculty, and staff with Palmetto Cluster accounts.

Overview

  • available for free to all Clemson students, faculty, and staff
  • 740 computer nodes (responsible for processing and computation)
  • 1 TB of personal space (not backed up)

Support

GalaxyGIS Cluster support is provided by the GIS group


Open Science Grid

Open Science Grid (OSG) is a freely accessible distributed computing resource for scientific calculations designed to handle huge number of “small” computational tasks - high throughput computing (HTC).

Clemson University has been working with OSG on providing seamless access to the OSG resources for Clemson researchers. OSG has been recently integrated into the Palmetto cluster for sending and receiving high throughput jobs using the OSG framework. Access to OSG is free of charge.

Access to OSG from Palmetto is available using Connect Client software. OSG uses separate accounting system and before trying it Clemson users need to request an OSG account.

Open Science Grid
OSG Connect
New OSG Account


XSEDE resources

The Extreme Science and Engineering Discovery Environment (XSEDE) is a collection of national advanced cyberinfrastructure resources. XSEDE provide access to both dedicated computing systems and experts is computationally oriented research areas. Computing resources include Stampede, Comet, SuperMIC, Jetstream, Wrangler, Bridges and other systems.

For more information about XSEDE resources contact one of the XSEDE Campus Champions at Clemson University:

XSEDE
List of XSEDE resources
XSEDE Allocations


Visualization Lab

The Visualization Lab (Barre Hall 2004) provides cyberinfrastructure for the visualization and virtual reality needs of Clemson students, faculty and staff, including:

  • Virtual reality head mounted displays (Oculus Rift, Microsoft HoloLens, Samsun Gear VR, etc.,)
  • Visualization workstations equipped with high-end Nvidia Graphics cards
  • Tiled displays and 3-D projector
  • Visualization cluster with 5 nodes and 40 Gbps connection to Palmetto Cluster

For details about available resources, please see here, or contact Wole Oyekoya ooyekoy@clemson.edu. For events, demos, and office hours, please see the visualization calendar.


Network

Clemson network infrastructure is connected with high speed network provided by Internet 2. The high speed (100Gbps) network provides external connectivity to Palmetto cluster and main campus (selected buildings).

Data transfer node (DTN)

Palmetto infrastructure includes two servers dedicated to fast transfer of data.

  • xfer01-ext.palmetto.clemson.edu is node dedicated to large file transfers using traditional tools like scp, FileZilla. This server also hosts Globus Endpoint for Palmetto cluster. Palmetto /home and all scratch file systems are available on this server.

  • hpcdtn01-ext.clemson.edu is a server dedicated to large file transfer using Internet2. This DTN is part of Pacific Research Platform and includes large SSD based file system to facilitate best disk-to-disk transfer speed.

Pacific Research Platform (PRP)

Pacific Research Platform is a collaboration of universities to establish fast network connections between servers dedicated to data transfer. Clemson University is PRP partner and hpcdtn01-ext.clemson.edu is part of the PRP network. The list of all participating institutions and their DTNs is available on PRP Dashboard which allows also for assessment of the connection speed between different sites.

PRP Dashboard
Pacific Research Platform
Internet 2
Advanced Layer 2 Service
ESnet Fasterdata Knowledge Base


Research support

Cyberinfrastructure Technology Integration (CITI) group provides support to Clemson University researchers in broadly defined research computing. CITI provides workshops covering introduction to HPC systems, introduction to programming for researchers and area specific research computing. CITI staff provides assistance in utilizing local and external dedicated computing resources and assists in porting and optimizing workflows.

CITI group includes subgroups

Advanced Computing and Data Science

Advanced Visualization

Geographic Information System Group

Program Manager

More about CITI group
CITI Training and workshops


Prices

All prices cover 4 year term.

Type Unit Description Price Comments
Storage 1TB - ZFS system available only to Palmetto cluster
- Snapshots included in user space
- Full mirror for system recovery
$150.00 Owners of existing SAMQFS spaces may expand existing storage for the same price as ZFS storage
Palmetto compute node 1 unit - 2 x Intel Xeon E5-2680v3 “Haswell” @2.5 GHz (for a total of 24 cores)
- 2 x NVIDIA Tesla K40c GPU accelerators
- 128 GB DDR4 RAM
- 2 x 1 TB local hard drives
- On-board 10 Gbps Ethernet NIC
- InfiniBand FDR 56 Gbps network card
$6250.00 All grant budgets should assume $8000 price as a projected price for future expansions of Palmetto
Cypress (Hadoop) node 1 unit - 2 x Intel Xeon CPU E5-2680v3 “Haswell” @2.5 GHz (24 cores)
- 256 GB DDR4 RAM
- 24 x 6 TB 7200 RPM local hard disks for data storage (144 TB total*)
- 2 x 300 GB 10k RPM local hard disks for host system
- On-board 10 Gbps Ethernet NIC
$16,224.00 All grant budgets should assume $20,000.00 price as a projected price for future expansions of Cypress.

*Usable storage on a Cypress node may be less than the included 144 TB depending on the configured HDFS replication factor. Using the Cypress Cluster default HDFS replication factor of 2, your usable data capacity would be 72 TB. The replication factor is user-configurable at the file level. We recommend using a replication factor of at least 2 to mitigate the risk of losing data due to a hardware failure.