When complex design and manufacturing tasks like finite element analysis (FEA), fluid-flow modeling and thermal imaging first appeared, they required expensive mainframe infrastructure. More recently, the advent of high performance workstations have allowed engineers to design, visualize and simulate products from concept through manufacturing locally, on a desk-side workstation. The benefits have been enormous: savings in time, cost and productivity have fueled some of the most impressive innovations ever produced.
However, the escalating design complexity of current design and simulation applications is once again grinding productivity to a crawl. Simulation jobs can easily run for several hours as millions of interrelated computations are performed. Bottlenecks remain because simulation and modeling cannot be done in parallel; model sizes are growing exponentially; and complex simulations, when parallelized, can exceed eight computational threads and consume hundreds of gigabytes of memory. That easily outstrips the capacity of a single desk-side workstation - even one running overnight.
Engineering managers must address these new bottlenecks without ceding control of computing resources to an already-overworked IT department by returning to the days of centralized, mainframe-style machines dedicated to a single or few specialized tasks. One way a growing number of engineering departments are doing this is through the deployment of "High Performance Computing (HPC) Clusters."
Once the bastion of the most complex and expensive problems for government and theoretical research, HPC clusters used to require specialized expertise to design, build and manage. Having benefited from years of development in both hardware and system-management software, most of these HPC clusters now serve as department-level resources, ranging in size from two to 32 nodes (a node is an individual interconnected server). And because they're tightly connected, HPC clusters provide benefits that are not available with workstations:
- Clusters can scale. Instead of running simulations on their personal workstations, designers can spread jobs across multiple servers in a cluster.
- Clusters restore individual and group productivity. A cluster can run 24x7, allowing designers to dedicate personal workstations to creating models. And with workload-management software, high-priority jobs can be submitted to the cluster without impacting a designer's workday efforts.
- Clusters can grow. Today, when a workstation is no longer sufficient to handle, an entirely new workstation must be purchased, installed and configured. However, with an HPC cluster, additional nodes are added as needed. And cluster-management software can automatically configure the new nodes to work seamlessly with the existing cluster.
- Clusters can be highly available. Failed jobs can be restarted automatically. Workload-management software can maintain an inventory of cluster status and distribute jobs only to nodes with available capacity. And simulation jobs can also be "bound" to specific nodes, where specific processors are located.
- Clusters can be shared. Today, designers using different applications on personal workstations cannot share resources. But a cluster's cluster-management and workload-management software can work in unison to repurpose nodes, with multiple operating systems and applications on-demand.
One of the most common concerns of engineering departments implementing clusters for the first time is management. Concerned engineering managers worry about becoming an ad hoc IT department, needing specialized people on staff to design, build and manage the clusters, and then needing to devote resources to managing them. Advances at the microprocessor and system software levels mean none of that is necessary.
Collaboration between hardware and software makers helps enable new HPC cluster users to get their work done easier, faster, and better by moving from the limitations of desktops and workstations to the increased productivity potential of clusters. There is a new generation of hyper-threaded processors that create the balance needed or raw performance desired to solve large-scale problems faster. The result: a performance increase of 2x over the previous generation so designs are rendered fasters, data is displayed more quickly and with higher fidelity, and visual comparisons are completed faster. Additionally, a common design specification helps eliminate variances for testing and support.
Complementing these gains on the hardware side are software suppliers who offer products that automatically provision jobs, manage nodes, manage workloads, schedule applications in parallel and dynamically schedule tasks. Integrated scheduling and user consoles are important new advances here. In this way, an increasing number of engineering departments, the productivity, scale and resource efficiencies offered by HPC clusters have created a real alternative to today's design bottlenecks.
Tom Zsolt, PE, is vice president of Platform Computing, a maker of software to create and manage computing clusters. http://www.platform.com/ https://channeleventsponsors.intel.com/intelwebinar/event_register.php?id=967