Cyberinfrastructure Technology Integration

Clemson Computing and Information Technology (CCIT) provides research cyberinfrastructure resources and advanced research computing capabilities through its Cyberinfrastructure Technology Integration (CITI) group.

Training Workshop

CITI partners with researchers across campus and across the country to offer a diverse catalog of advanced computing training opportunities for Clemson University students, researchers, faculty, and staff, as well as opportunities for our external partners at other universities and organizations. If you have problems with or questions about course registration, please email ithelp@clemson.edu with the words “Palmetto training” in the subject.

All workshops listed below are free for Clemson University students, faculty and staff. Registration is required, and can be done at https://cucourse.app.clemson.edu one week prior to the listed start dates.

Spring 2018 Schedule of Workshops

Introduction to Linux

Software Carpentry based Introduction to the Linux Command Line Interface

  • Monday, January 8, 9:00AM - 12:00PM. Location: Barre Hall B105
  • Friday, February 16, 9:00AM - 12:00PM. Location TBD
  • Wednesday, March 14, 9:00AM - 12:00PM. Location TBD
  • Friday, April 13, 9:00AM - 12:00PM. Location: Cooper Library 406A

Introduction to Version Control with Git/GitHub

This workshop will cover Git, a revision control tool that lets people effectively track changes to software projects (such as technical papers, theses, or programs). Git allows them to work more systematically, by saving “snapshots” of different versions of the project, and allowing them to easily view or undo changes between snapshots. This lets them make changes to their project without the fear of losing work or “breaking things”. The workshop will also cover using Git with GitHub, a website that allows people to share their projects with the world (or with collaborators), and for groups of people to collaborate on projects in a more systematic way than say, e-mailing files to each other or using cloud storage.

  • Tuesday, January 9, 1:00PM - 4:00PM. Location: Cooper Library 406A
  • Tuesday, April 17, 9:00AM - 12:00PM. Location: Cooper Library 406A

Introduction to Research Computing on Palmetto Cluster

This workshop introduces participants to the Palmetto Cluster–Clemson University’s largest high-performance computing resource–its structure and basic usage and how to submit computational tasks to the cluster.

  • Thursday, January 11, 9:00AM - 12:00PM. Location: Barre Hall B105
  • Friday, February 16, 1:00PM - 4:00PM. Location TBD
  • Wednesday, March 14, 1:30PM - 4:30PM. Location TBD
  • Friday, April 13, 1:00PM - 4:00PM. Location: Cooper Library 406A

Introduction to Programming in Python

This workshop introduces participants to programming, using the Python programming language, and is built around common scientific tasks such as loading, analyzing and visualizing data. The intended audience is researchers or students with no prior programming experience.

  • Part 1: Monday, January 8, 1:00PM - 4:00PM. Location: Barre Hall B105
  • Part 2: Tuesday, January 9, 9:00AM - 12:00PM. Location: Cooper Library 406A
  • Part 1: Monday, April 16, 9:00AM - 12:00PM. Location: Academic Success Center 118
  • Part 2: Monday, April 16, 1:00PM - 4:00PM. Location: Academic Success Center 118

Introduction to Data Science using R

Introduction to R language for data analytics using RStudio on PC and also Jupyter notebooks on Palmetto. Workshop contents include basic understand of R, installation of additional R modules, introduction to data manipulation, introduction to visualization, and several best practices for using R. No prior knowledge of R or programming in general is required.

  • Tuesday, January 9, 9:00AM - 12:00PM. Location TBD
  • Wednesday, April 4, 1:30AM - 4:30PM. Location TBD

Data Mining using R

This workshop focuses on data mining techniques in R, with the emphasis on techniques to acquiring and curating data via online sources. For acquiring data, we will learn how to download from static links, crawl through entire websites, and stream data from real-time sources. For curating data, we will learn how to expand and extract information from acquired data, which are often stored under non-structured/semi-structured online data (XML, JSON, …), into structured format suitable to subsequent analysis. We will also learn about best practices in data management, including organizing data directories, working with databases, and automating data-mining process through the Palmetto Supercomputer.

  • Thursday, January 11, 9:00AM - 12:00PM. Location: Barre Hall B106
  • Friday, April 6, 9:00AM - 12:00PM. Location: Academic Success Center 118

Introduction to Hadoop and MapReduce

This workshop will teach how to utilize Hadoop MapReduce and Python to perform large scale data analytics. Learning outcomes of this workshop include understanding the overall architecture of the Hadoop Distributed File System (HDFS) and understanding the concept of MapReduce. Throughout the workshop, participants will learn to develop and run MapReduce programs, examine system logs in order to perform debugging MapReduce applications, and be able to optimize MapReduce applications.

  • Thusday, January 11, 1:00PM - 4:00PM. Location: Barre Hall B105
  • Tuesday, April 17, 9:00AM - 12:00PM. Location: TBD

Introduction to Spark for fast in-memory big data processing using Python

This workshop will teach how to how to utilize Apache Spark and Python to perform large-scale in-memory data analytics. Learning outcomes of this workshop include understanding the overall conceptual design of Spark and what are the advantages of using Spark over the traditional Hadoop MapReduce. Participants will also learn to develop Spark programs using Python and to leverage Spark’s specific capacities such as SQLContext and DataFrame to assist with data analytics.

  • Tuesday, January 16, 9:00AM - 12:00PM. Location: Barre Hall B105
  • Thursday, April 19, 9:00AM - 12:00PM. Location: Academic Success Center 118

Introduction to Machine Learning Techniques and Tools

The first half of this workshop focuses on machine learning techniques in Python (Scikit-learn), which remains the overwhelming first choice as a programming language for machine learning. The second half of workshop focused on deep learning techniques in DIGITS, the NVIDIA Deep Learning GPU Training System, which is easy to learn and use. We will also learn how to process through the Palmetto Supercomputer.

  • Thursday, January 18, 9:00AM - 12:00PM. Location: Barre Hall B105
  • Monday, April 9, 9:00AM - 12:00PM. Location: Academic Success Center 118

Scientific Programming using Python

This workshop is intended for those already programming in Python or a similar language, and will cover organizing and writing scientific programs and improving their quality and efficiency. Topics covered will include:

  1. Packaging in Python (i.e., how to organize and share your code)
  2. Advanced NumPy and Pandas for scientific programming
  3. Test driven development. debugging and profiling
  4. Performance programming: Parallel Python, Interfacing with C and Fortran
  • Tuesday, January 23, 9:00AM - 4:00PM. Location: Barre Hall B105

Data Visualization using R

In this session, we wil go over various techniques to format and visualize data using complex plots and the highly popular ggplot package in R.

  • Thursday, February 8, 9:00AM - 12:00PM. Location: Cooper Library 406A

Big Data Analytics using R

Memory and processor limits are significant challenges for analyzing large amount of data using R. In this session, we will learn how to use Spark, an in-memory distributed infrastructure, to process massive amount of data in R.

  • Thursday, January 25, 9:00AM - 12:00PM. Location: Barre Hall B105.

Textual Data Analytics using R

In this session, we will learn how to utilize natural language process techniques to manipulate and study textual data in R.

  • Thursday, February 15, 9:00AM - 12:00PM. Location: Academic Success Center 118

Introduction to GPU Programming

  • Tuesday, January 30, 9:00AM - 12:00PM. Location: Barre Hall B105.

Scientific Visualization with ParaView

In this training session, an introduction to scientific visualization by using ParaView will be provided. The topics that will be covered are:

  • How to load different datasets and simulation results in ParaView.
  • How to apply pre-defined filters on loaded datasets in order to extract information about simulation results and create animation.
  • How to connect the ParaView to the Palmetto cluster in order to load big data structures and deal with them by using parallel visualization.
  • Send the scientific visualization datasets into HTC Vive headsets in order to interact with data structures in virtual reality environment.
  • How to use Python programming language in order to create customized filters and deal with complex data structures.
  • Show some real case of simulation results in ParaView to demonstrate the powerful tools of this scientific visualization software.

  • Friday, January 26, 11:00AM - 12:00PM (This time is subject to change). Location TBD.

Scientific Visualization with VisIt

In this training session, an introduction to scientific visualization by using VisIt will be provided. The topics that will be covered are:

  • How to load different datasets and simulation results in VisIt.
  • How to apply pre-defined filters on loaded datasets in order to extract information about simulation results and create animation.
  • How to connect the VisIt to the Palmetto cluster in order to load big data structures and deal with them by using parallel visualization.
  • Show some real case of simulation results in VisIt to demonstrate the powerful tools of this scientific visualization software.

  • Friday, February 2, 11:00AM - 11:30AM (This time is subject to change). Location TBD.

Scientific Visualization with VMD

In this training session, an introduction to molecular dynamics and biomolecular visualization by using VMD will be provided. The topics that will be covered are:

  • How to open different molecular and biomolecular structures in VMD and extract different regions of interest like hydrophilic and hydrophobic parts of a molecule.
  • How to show the molecular structures by different visualization types like using chain, ribbons etc. and assign the computed fields like temperature or movement of atoms as colors the molecular structures.
  • How to create animation from dynamic molecular simulation and extract the positional information about atoms like RMSD.
  • How to use Palmetto cluster in order to deal with big molecular structures and installing and using VMD on Palmetto cluster.

  • Friday, February 9, 11:00AM - 11:30AM (This time is subject to change). Location TBD.

Scientific Visualization with CUDA

In this training session, some real case of using CUDA/OpenGL in scientific visualization will be showed and then some information about using Palmetto cluster in order to combine CUDA and high performance computing in order to visualize big data structures will be provided.

  • Friday, February 16, 11:00AM - 11:30AM (This time is subject to change). Location TBD.

GIS Training

For GIS Training, please visit Clemson Center for Geospatial Technologies.