Kalman filtering is a statistical estimation algorithm that uses measurements observed over time and a physical model to minimize errors due to uncertainties in the measurements (measurement noise) and the physical process (process noise). Kalman filtering is used in a wide range of applications, including guidance, navigation, and control of vehicles, signal processing, robot motion planning, trajectory optimization, modeling of the central nervous system and, most pertinent to this proposal, charged particle track reconstruction.

Although the Kalman filtering algorithm itself is quite simple, the technical details associated with implementing a Kalman filter can vary between applications -- every application has its own physics equations and target architecture and every application has different runtime requirements. For instance, when implementing a Kalman filter for vehicle guidance the goal is to develop an online, real-time Kalman filter that can estimate the position of a single object using inputs from O(10) different sensors, all in a continuous-time integration loop. In contrast, with upwards of 1000 tracks per event (further amplified by the combinatorial nature of the track reconstruction) and nuclear physics accelerators operating between 10 kHz to 10 MHz depending on the collision species, the issues facing Kalman filtering in charged particle track reconstruction are largely due to the sheer number of relatively small Kalman Filtering operations that need to take place.

Moreover, while the internal properties of a Kalman filter make it a good candidate for acceleration using parallelization and GPUs, most accelerated implementations target a single compute architecture. Next generation, exascale HPC facilities have started to use hybrid architectures with processors and compute accelerators from multiple vendors (e.g., Intel Xe HPC/Ponte Vecchio, Nvidia A100, AMD Instinct). This is expected to foreshadow a growing number of GPU architectures in many compute environments. Therefore, in order to exploit a wide range of leadership computing facilities, any GPU parallelization effort should support a range of hardware architectures and programming environments. It is not feasible for application developers to hand optimize every algorithm in every application to target the range of current and future hardware architectures.

In this STTR project, RNET and WSU will build a flexible Kalman Filter code generator that will produce highly optimized Kalman filter code given only a set of physics equations, a target architecture, and a description of the use case. Utilizing operator fusion, operator specialization, optimizations for problem size and sparsity, and with support for batch processing and distributed architectures, the framework will generate Kalman filtering code that is designed to suit the specific features and requirements of the given problem.

The key features of this product will be:

  • Support for a wide range of state estimation algorithms. The toolkit will support the generation of a number of algorithms, including linear and nonlinear Kalman filters, the unscented Kalman filter and Moving Horizon Estimation.
  • A simple input interface. The tool will include an intuitive domain-specific language that allows users to enter physics equations and problem definitions in a simple and intuitive way.
  • Code optimized for the user's architecture. The toolkit will generate code that is highly optimized to their target architecture. Porting a code to a different architecture will be as simple as changing a command-line argument.
  • Support for a range of runtimes. The toolkit will include a range of runtime environments, including support for fast online estimation, distributed batch processing, and the Combinatorial Kalman filter.