Kestrel is a CREATE simulation tool for virtual fixed wing aircraft simulation. The primary goal of the tool is to provide rapid engineering analysis to allow simulations to impact early-stage development. Therefore, a primary goal of Kestrel is rapid turnaround time, requiring efficient user interfaces and advanced parallel simulations. The current Kestrel code is based on the the Air Force Research Laboratory, Air Vehicles Directorate (AFRL/RB) AVUS (formerly Cobalt60) as the basis for their kCFD Flow Solver. AVUS is a CFD code that uses a cell-centered, finite volume approach to solve the unsteady 3D compressible Reynolds Averaged Navier-Stokes (RANS) equation on hybrid unstructured grids. The original Cobalt60 code was parallelized by Grismer et. al.. The structure dynamics code in Kestrel is modeled using a modal representation.

In order to allow Kestrel to meet its goals of early-phase design input, it must be able to leverage the most efficient compute architectures available. Therefore, the linear solvers in Kestrel must be able to scale to systems with 10,000 cores and more.

Massively parallel computing exhibit at least three distinct forms of hardware parallelism that must all be effectively utilized in order to achieve high performance. These three levels of parallelism are exposed to the developers of high-performance applications, but without use of better software tools available from vendors, the adaptation of the applications and/or libraries are required. The three levels of parallelism are:

  1. Distributed-Memory (Inter-Node) Parallelism among processors that cannot efficiently perform direct access to data resident in each other’s physical memory;
  2. Shared-Memory (Intra-Node) Parallelism among processors that share cache and/or physical memory, e.g., multiple cores on a chip (socket), and multiple sockets on an Symmetric Multi-Processing (SMP) node; and
  3. SIMD Parallelism either in the form of short-vector SIMD instruction sets like Streaming SIMD Extensions (SSE), Altivec, Advanced Vector Extensions (AVX), Larrabee New Instruction Set (LRBNi) etc., or in the Single Instruction Multiple Thread (SIMT) style of fine-grained parallelism in GPUs.

In order to address this parallelization, developed optimizations to improve the performance and scalability of government codes, particularly Kestrel (a Create 3D code developed by the U.S. AF). Kestrel is a CREATE simulation tool for virtual fixed wing aircraft simulation. In order to leverage the computational capabilities of new and emerging architectures, such codes must be redesigned to exploit distributed memory, shared memory, and SIMD parallelism. This effort focused on the optimizations of the CFD solver and the FSI in Kestrel.

The primary accomplishments of this work included:

  • The development and implementation of a multi-tiered Gauss Seidel kernel that increases the number of operations performed before cache misses. This addresses the issues of low intra-node shared memory performance due to memory bandwidth. The scheme is also used to improve the scalability across multiple distributed memory nodes.
  • SIMD optimizations to the CFD solver to further improve the performance of Kestrel. This includes the refactoring of data structures and the implementation of the optimizations in the Gauss Seidel implementation.
  • The exploration of GPU and Xeon Phi kernels for the Gauss Seidel (or similar) solver in Kestrel. The host/accelerator bandwidth is a major limiting concern for the existing Gauss Seidel implementation.
  • The exploration of the expected performance of KSP solvers from alternate linear algebra libraries. These could have the benefit of increased local work to improve performance. However, for the example Kestrel uses cases, these schemes do not exhibit a significant performance improvement due to the good convergence of Gauss Seidel for these test cases.
  • Exploration of an ADI like scheme for unstructured solvers that reduces the number of iterations required for convergence as compared to Gauss Seidel.
  • Implementation and demonstration of a sub-iteration free scheme for the FSI interaction. This addresses a significant bottleneck in the a coupled Kestrel solver.
  • SIMD optimization of the FSI and CSD implementation to further improve the SMP performance of Kestrel.