Empirical Dynamic Modeling (EDM) is a mathematical framework designed for studying nonlinear dynamical systems. EDM is based upon the concept of state-space reconstruction (SSR). Takens' theorem states that the attractor manifold of a multivariate dynamical system can be reconstructed from time-lagged coordinates of a single time series variable. Figure 1 illustrates the concept of state space reconstruction.

EDM encompasses multiple functions, but we focused on two main techniques, the simplex projection, which uses individual variables to create univariate embeddings to predict itself, and convergent cross-mapping (CCM) a method for nonlinear causal inference. Simplex projection is a nonlinear forecasting algorithm often used for estimating the dimensionality of a dynamical system using its prediction skill as a selection criterion. Simplex projection uses a single time series, which splits into two halves for model building and cross-validation with withheld data.

Then, CCM determines the existence and strength of causality between two-time series variables. CCM calculations are similar to simplex projection, with the main difference being that instead of predicting within a single time series, CCM predicts across time series using one-time series to predict another.

If time series *y* can be predicted from time series *x* with significant accuracy, we conclude that *y CCM causes x*.

In this research demonstration, we utilize CCM to infer all the causal relationships between every neuron in an entire fish brain and construct a causal map that describes the dynamic causal interactions among all neurons of an entire brain. For this purpose, we have recorded the neural activity (*i.e.* firing rate) of an entire larval zebrafish brain in a single-neuron resolution by using light sheet fluorescence microscopy.

The original implementation of EDM, cppEDM, has mostly been used for individual time series of relatively short length and mostly small numbers of variables due to its computational cost. Since a larval zebrafish brain contains approximately 100 thousand neurons, a staggering number of 10 billion cross mappings need to be performed in total. CCM of this enormous scale has never been achieved so far because of the sheer amount of computation required. An early attempt with the ABCI supercomputer, a whole brain run took 23.5 hours using 512 out of the 1,088 compute nodes to complete this run.

The goal of this research is to develop mpEDM, a highly scalable and optimized implementation of EDM that is able to analyze the whole zebrafish brain dataset within a reasonable time and computational resources.

**ArrayFire's role in the application**

In the initial stage of our research, we profiled cppEDM and found out that the primary bottleneck in CCM computation is the k-nearest neighbors (kNN) search in the state space. To accelerate the kNN search, we modified the CCM algorithm to perform an all-to-all kNN search first, store the indices and distances of the nearest neighbors as a table, and then lookup this table. On the GPU, we utilized ArrayFire’s nearestNeighbor() function to implement the all kNN search. We benchmarked several GPU-based kNN implementations available online, but ArrayFire was the best-performing for the parameters we are interested in (small k and low dimension). On the CPU, we implemented our own parallel all kNN search using OpenMP.

Since the causal inference between different pairs of time series can be computed independently, we distribute the work across multiple GPUs. This was relatively easy to implement using ArrayFire’s automatic memory management and setDevice() function. Furthermore, we built a simple master-worker framework using MPI to distribute the work among multiple compute nodes.

## Performance results

We evaluated the performance of mpEDM on the AI Bridging Cloud Infrastructure (ABCI), the 2nd most powerful supercomputer in Japan as of May 2021. An ABCI compute node is equipped with two Intel Xeon Gold 6148 CPUs and four NVIDIA V100 GPUs. mpEDM demonstrated significantly higher performance compared to cppEDM. cppEDM took 8.5 hours to analyze the Fish1_Normo dataset, a real-world dataset recorded from a larval zebrafish, using 512 compute nodes. In contrast, mpEDM took only 20 seconds to analyze the same dataset and number of compute nodes, which indicates 1,530x speedup compared to cppEDM. Moreover, mpEDM analyzed two even larger datasets as shown in Table 1.

Figure 2 shows the speedup of the GPU version over the CPU version with respect to the number of time steps. Evidently, the GPU speedup increases with the number of time steps.

Dataset | # of Time Steps | # of Time Series | cppEDM Runtime | mpEDM Runtime |

Fish_Normo | 1,450 | 53,053 | 8.5h | 20s |

Subject6 | 3,780 | 92,538 | N/A | 101s |

Subject11 | 8,528 | 101,729 | N/A | 199s |

## Scientific results

The scientific question investigated was on the nature of information sharing, signal complexity at every scale, and the causal relationships at single-neuron resolution to the whole-brain scale. To answer this question we studied the transparent larval zebrafish in low oxygen avoiding task (A) and recorded the brain activity at single-neuron resolution (B).

This was done using transgenic fish that had a fluorescent brain activity reporter GCAMP6 that measures intracellular calcium in neurons giving a fluorescent signal proportional to calcium concentration which is a surrogate for neuronal electrical activity. These calcium signals were imaged by selective plane illumination microscopy (SPIM) otherwise also known as light-sheet microscopy (B).

The simplex function measures signal complexity as the embedding dimension, with more complex signals with higher dimensionality. The results show that as the animals transition from normoxic to hypoxic conditions, the animal goes into an escape response and neuronal activity in the brain is simplified and dimensionality decreases for the majority of neurons (B, C).

The causal relationships among neurons also change from normal oxygen conditions when transitioning into a hypoxic escape response. Brain connectivity is distributed and shows local coordination (E) and as it transitions into hypoxic conditions coordination and global causal coherence are increased in the escape response (F).

From this data, one can find select low dimensional manifolds constructed from the activity from a select small number of neurons that allow the prediction of fish turn behaviors at least 0.5 seconds ahead of them occurring (G). These low dimensional manifolds capture the intrinsic low dimensional geometry of the behavior of both neurons and the whole organism and provide equation-free mathematical models based on the data geometry alone.

## Conclusion

Our mpEDM project is a parallel distributed implementation of EDM optimized for execution on modern GPU-centric supercomputers. mpEDM took only 20 seconds to finish the causal inference of a dataset containing the activity of 53,053 zebrafish neurons on 512 ABCI nodes. This is 1,530x faster than cppEDM, the current standard implementation of EDM. Moreover, mpEDM was able to analyze a 13x larger dataset in 199 seconds. This is the largest CCM causal inference achieved to date.

Our paper was presented at the ICPADS 2020 conference and a preprint version of the paper is available on arXiv.

Special thanks to the authors who contributed to this project and paper, listed below:

Nara Institute of Science and Technology, Nara, Japan

Florida, USA

University of California, San Diego, California, USA

National Institute of Advanced Industrial Science and Technology, Tsukuba, Japan

National Institute of Advanced Industrial Science and Technology, Tsukuba, Japan

Salk Institute for Biological Studies, California, USA