Skip to content

HPC

Awesome HPC Awesome

High Performance Computing tools and resources for engineers and administrators.

High Performance Computing (HPC) most generally refers to the practice of aggregating computing power in a way that delivers much higher performance than one could get out of a typical desktop computer or workstation in order to solve large problems in science, engineering, or business.

(click to expand)

Provisioning

  • Grendel - Bare Metal Provisioning system for HPC Linux clusters ([Source Code](https://github.comubccr/grendel))GPL-3`.
  • XCat - xCAT is a toolkit for deployment and administration of clusters of all sizes (Source Code) EPL-1.0.
  • Warewulf - Warewulf is a stateless and diskless container operating system provisioning system for large clusters of bare metal and/or virtual systems (Source Code) BSD-3.
  • Rocks - A Linux distribution for developing Linux clusters other.
  • Cobbler - Cobbler is a Linux installation server that allows for rapid setup of network installation environments (Source Code) GPL-2.0.
  • Base Command Manager - Base Command Manager allows administrator to quickly build and manage heterogeneous clusters Proprietary.
  • Scyld - Scyld Clusterware Scyld ClusterWare is developed based on the continuing evolution of Beowulf clusters first developed at NASA in the 1990s Proprietary.
  • BlueBanquise - BlueBanquise is an open source cluster deployment and management stack built on Python and Ansible (Source Code) MIT.

Workload Managers

  • Slurm - A free and open source job scheduler (Source Code) OSS.
  • LSF - A job scheduler and workload management software developed by IBM Proprietary.
  • Moab - Moab is a workload management and job scheduler other.
  • Torque - Torque is a workload management and job scheduler other.
  • OpenLava - OpenLava is a workload management and job scheduler other.
  • UGE/SGE - Univa Grid Engine is a workload management engine for HPC Proprietary.
  • Volcano - Volcano is a batch system built on Kubernetes Apache-2.0.
  • Maui - Maui is a workload management and job scheduler other.
  • Kube Batch - A batch scheduler of kubernetes for high performance workload, e.g. AI/ML, BigData, HPC Apache-2.0.
  • OpenPBS - OpenPBS® software optimizes job scheduling and workload management in high-performance computing (HPC) environments (Source Code) other.

Pipelines

  • Nextflow - Data drive computational pipelines Apache-2.0.
  • Cromwell - Scientific workflow engine designed for simplicity & scalability (Source Code) BSD-3.
  • Pegasus - A configurable system for mapping and executing scientific workflows over a wide range of computational infrastructure (Source Code)Apache-2.0.

Applications

  • Spack - A flexible package manager that supports multiple versions, configurations, platforms, and compilers (Source Code) other.
  • EasyBuild - EasyBuild - building software with ease (Source Code) GPL-2.

Compilers

  • Nvidia - NVIDIA HPC compiler suite for Fortran, C/C++ with OpenACC Proprietary.
  • Portland Group - The Portland Group compilers were Fortran, C/C++ compilers now integrated into NVIDIA HPC SDK Proprietary.
  • Intel - The Intel compiler suite offers many language compilers for use in the HPC space Proprietary.
  • Cray - A suite of compilers designed and optimized to target the AMD interlagos instruction set Proprietary.
  • GNU - The GNU Compiler Collection is a suite of compilers targeting many languages (Source Code) GPL-3.
  • LLVM - The LLVM project is a collection of modular compilers and toolchains (Source Code) OSS.

MPI

  • OpenMPI - OpenMPI is an open source implementation of the MPI-3.1 standard (Source Code) BSD.
  • MPICH - MPICH is a high-performance and widely portable implementation of the MPI-3.1 standard (Source Code) other.
  • MVAPICH - MVAPICH is an open source implementation of the MPI-3.1 standard developed by Ohio State University BSD.
  • Intel-MPI - Intel-MPI is Intel's MPI-3.1 implementation included in their compiler suite other.

Parallel Computing

  • ArrayFire - A general purpose tensor library that simplifies the process of software development for parallel architectures other.
  • OpenMP - OpenMP is an application programming interface that supports multi-platform shared-memory multiprocessing programming other.

Benchmarking

  • OSU Benchmarks - A collection of benchmarking tools for MPI developed by Ohio State University other.
  • Intel MPI Benchmarks - A set of benchmarks developed by Intel for use with their Intel MPI other.
  • HPCC Systems - HPCC Systems (High Performance Computing Cluster) is an open source, massive parallel-processing computing platform for big data processing and analytics (Source Code) other.
  • LINPACK - LINPACK is a set of efficient fortran subroutines for solving linear systems which benchmarks are useful for HPC other.
  • IOzone - IOzone is a filesystem benchmark tool OSS.
  • IOR - Interleaved or Random is a useful benchmarking tool for testing parallel filesystems other.
  • MDtest - MDtest is an MPI-based application for evaluating the metadata performance of a file system other.
  • FIO - Flexible I/O is an advanced disk benchmark that depends upon the kernel's AIO access library (Source Code) GPL-2.
  • elbencho - A distributed storage benchmark for files, objects & blocks with support for GPUs GPL-3.

Miscellaneous

  • OpenOnDemand - Open OnDemand helps computational researchers and students efficiently utilize remote computing resources by making them easy to access from any device (Source Code) MIT.
  • Open XDMod - Open XDMoD is an open source tool to facilitate the management of high performance computing resources (Source Code) LGPL-3.
  • Coldfront - ColdFront is an open source resource allocation system designed to provide a central portal for administration, reporting, and measuring scientific impact of HPC resources (Source Code) GPL-3.
  • Pavilion2 - Pavilion is a Python 3 (3.6+) based framework for running and analyzing tests targeting HPC systems (Source Code) other.
  • Reframe - A powerful Python framework for writing and running portable regression tests and benchmarks for HPC systems. (Source Code) BSD-3.
  • OLCF Test Harness - The OLCF Test Harness (OTH) helps automate the testing of applications, tools, and other system software (Source Code) other.
  • GoSlmailer - Goslmailer is a drop-in notification delivery solution for slurm that can do slack, mattermost, teams, and more.

Performance

  • TotalView - TotalView is a debugging tool for HPC applications Proprietary.
  • Tau - TAU Performance System® is a portable profiling and tracing toolkit for performance analysis of parallel programs written in Fortran, C, C++, UPC, Java, Python other.
  • Valgrind - Valgrind is a tool designed to profile programs to determine memory leaks (Source Code) GPL-2.
  • Paraver - Paraver is a very flexible data browser that is part of the CEPBA-Tools toolkit other.
  • PAPI - Performance Application Programming Interface (PAPI) is a performance analysis tool (Source Code) other.

Parallel Shells

Containers

  • Apptainer - Apptainer is an open source container system (Source Code) BSD.
  • Charliecloud - Charliecloud provides user-defined software stacks (UDSS) for high-performance computing (HPC) centers (Source Code) Apache-2.0.
  • Docker - Docker is a set of platform as a service products that use OS-level virtualization to deliver software in packages called containers other.
  • uDocker - A basic user tool to execute simple docker containers in batch or interactive systems without root privileges (Source Code) Apache-2.0.
  • Shifter - Shifter is Linux containers for HPC (Source Code) other.
  • HPC Container Maker - HPC Container Maker is an open source tool to make it easier to generate container specification files. Apache-2.0.
  • Scarus - An OCI-compatible container engine for HPC BSD.
  • Singularity HPC - Singularity Registry HPC (shpc) allows you to install containers as modules (Source Code) MPL 2.0.

Environment Management

  • Lmod - Lmod: An Environment Module System based on Lua, Reads TCL Modules, Supports a Software Hierarchy (Source Code) other.
  • Environment Modules - Environment Modules: provides dynamic modification of a user's environment (Source Code) GPL-2.
  • Anaconda - Anaconda is a Python and R distribution for use in computational science other.
  • Mamba - Mamba is a reimplementation of the conda package manager in C++ (Source Code) BSD.

Visualization

  • Visit - VisIt - Visualization and Data Analysis for Mesh-based Scientific Data (Source Code) BSD-3.
  • Paraview - ParaView is an open-source, multi-platform data analysis and visualization application based on Visualization Toolkit (VTK) (Source Code) BSD-3.

Parallel Filesystems

  • GPFS - GPFS is a high-performance clustered file system software developed by IBM Proprietary.
  • Quobyte - A high performance filesystem Proprietary.
  • Ceph - Ceph is a distributed object, block, and file storage platform (Source Code) other.
  • Weka - A file system designed for HPC Proprietary .
  • Lustre/Exascaler - Lustre is an open-source, distributed parallel file system software platform designed for scalability, high-performance, and high-availability (Source Code) other.
  • BeeGFS - BeeGFS is a hardware-independent POSIX parallel file system developed with a strong focus on performance and designed for ease of use, simple installation, and management Proprietary.
  • OrangeFS - OrangeFS is a next generation parallel file system for Linux clusters (Source Code) other.
  • MooseFS - Moose File System is an Open-source, POSIX-compliant distributed file system developed by Core Technology (Source Code) GPL-2.0.

Programming Languages

  • Julia - Julia is a high-level, high-performance dynamic language for technical computing MIT.
  • Futhark - Futhark is a purely functional data-parallel programming language in the ML family isc.
  • Chapel - Chapel is a programming language designed for productive parallel computing at scale Apache-2.0.

Monitoring

Prometheus Based

  • Slurm Exporter - Prometheus exporter for performance metrics from Slurm GPL-3.0.
  • Slurm Exporter - Slurm Exporter for Prometheus using Rest API GPL-3.0.
  • Infiniband Exporter - The InfiniBand exporter collects counters from InfiniBand switches and HCAs Apache-2.0.
  • Cgroup Exporter - Produces metrics from cgroups Apache-2.0.
  • Cgroup Exporter - A Prometheus exporter for cgroup-level metrics unknown.
  • GPFS Exporter - The GPFS exporter collects metrics from the GPFS filesystem Apache-2.0.
  • Lustre Exporter - Prometheus exporter for use with the Lustre parallel filesystem GPL-3.0.
  • DCGM Exporter - NVIDIA GPU metrics exporter for Prometheus leveraging DCGM Apache-2.0.

Journals

Podcasts

  • This week in HPC - Each week, Intersect360 Research CEO Addison Snell and HPCwire editor Tiffany Trader dissect the week's top HPC stories.
  • Exascaler Project - ECP's Let's Talk Exascale podcast goes behind the scenes to chat with some of the people who are bringing a capable and sustainable exascale computing ecosystem to fruition.
  • @HPCpodcast - Join Shahin Khan and Doug Black as they discuss Supercomputing technologies and the applications, markets, and policies that shape them.

Blogs

  • HPCWire - Since 1987 covering the fastest computers in the world and the people who run them.
  • InsideHPC - insideHPC is a global publication recognized for its comprehensive and insightful coverage of the HPC-AI community, linking vendors, end-users and HPC strategists.
  • The Next Platform - Offers in-depth coverage of high-end computing at large enterprises, supercomputing centers, hyperscale data centers, and public clouds.
  • The Register HPC - The Register is a leading and trusted global online enterprise technology news publication, reaching roughly 40 million readers worldwide.
  • HPC at Dell - High-Performance Computing knowledge base articles from Dell.

Conferences

  • Pearc - Practice & Experience in Advanced Research Computing.
  • Supercomputing (SC) - The International Conference for High Performance Computing, Networking, Storage, and Analysis.
  • Supercomputing International (ISC) - The International Conference for High Performance Computing, Networking, Storage, and Analysis.
  • CCGrid - IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing.
  • IEEE-HPEC - IEEE High Performance Embedded Computing.
  • Hot Chips - Semiconductor industry's leading conference on high-performance microprocessors and related circuits.
  • Hot Interconnects - IEEE conference on software architectures and implementations for interconnection networks of all scales.
  • ESSA - Workshop on Extreme-Scale Storage and Analysis.
  • IEEE-IPDPS - IEEE International Parallel & Distributed Processing Symposium.
  • ESPM2 Workshop - International Workshop on Extreme Scale Programming Models and Middleware.
  • LCI Workshops - The Linux Clusters Institute (LCI) is providing education and advanced technical training for the deployment and use of computing clusters to the high performance computing community worldwide.
  • HPC Carpentry - Teaching basic skills for high-performance computing.

Websites

  • Top500 - The TOP500 project ranks and details the 500 most powerful non-distributed computer systems in the world.

User Groups

  • MVAPICH - The MUG conference provides an open forum for all attendees (users, system administrators, researchers, engineers, and students) to discuss and share their knowledge on using MVAPICH libraries.
  • Slurm - The annual Slurm user group meeting.

Contributing

Contributing guidelines can be found in https://github.com/dstdev/awesome-hpc/blob/master/contributing.md.