We formulate an equivalence between machine learning and the formulation of statistical data assimilation as used widely in physical and biological sciences. The correspondence is that layer number in a feedforward artificial network setting is the analog of time in the data assimilation setting. This connection has been noted in the machine learning literature. We add a perspective that expands on how methods from statistical physics and aspects of Lagrangian and Hamiltonian dynamics play a role in how networks can be trained and designed. Within the discussion of this equivalence, we show that adding more layers (making the network deeper) is analogous to adding temporal resolution in a data assimilation framework. Extending this equivalence to recurrent networks is also discussed. We explore how one can find a candidate for the global minimum of the cost functions in the machine learning context using a method from data assimilation. Calculations on simple models from both sides of the equivalence are reported. Also discussed is a framework in which the time or layer label is taken to be continuous, providing a differential equation, the Euler-Lagrange equation and its boundary conditions, as a necessary condition for a minimum of the cost function. This shows that the problem being solved is a two-point boundary value problem familiar in the discussion of variational methods. The use of continuous layers is denoted "deepest learning." These problems respect a symplectic symmetry in continuous layer phase space. Both Lagrangian versions and Hamiltonian versions of these problems are presented. Their well-studied implementation in a discrete time/layer, while respecting the symplectic structure, is addressed. The Hamiltonian version provides a direct rationale for backpropagation as a solution method for a certain two-point boundary value problem.
Most data based state and parameter estimation methods require suitable initial values or guesses to achieve convergence to the desired solution, which typically is a global minimum of some cost function. Unfortunately, however, other stable solutions (e.g., local minima) may exist and provide suboptimal or even wrong estimates. Here, we demonstrate for a 9-dimensional Lorenz-96 model how to characterize the basin size of the global minimum when applying some particular optimization based estimation algorithm. We compare three different strategies for generating suitable initial guesses, and we investigate the dependence of the solution on the given trajectory segment (underlying the measured time series). To address the question of how many state variables have to be measured for optimal performance, different types of multivariate time series are considered consisting of 1, 2, or 3 variables. Based on these time series, the local observability of state variables and parameters of the Lorenz-96 model is investigated and confirmed using delay coordinates. This result is in good agreement with the observation that correct state and parameter estimation results are obtained if the optimization algorithm is initialized with initial guesses close to the true solution. In contrast, initialization with other exact solutions of the model equations (different from the true solution used to generate the time series) typically fails, i.e., the optimization procedure ends up in local minima different from the true solution. Initialization using random values in a box around the attractor exhibits success rates depending on the number of observables and the available time series (trajectory segment). (C) 2015 AIP Publishing LLC.
Cardiac rhythm management devices provide therapies for both arrhythmias and resynchronisation but not heart failure, which affects millions of patients worldwide. This paper reviews recent advances in biophysics and mathematical engineering that provide a novel technological platform for addressing heart disease and enabling beat-to-beat adaptation of cardiac pacing in response to physiological feedback. The technology consists of silicon hardware central pattern generators (hCPGs) that may be trained to emulate accurately the dynamical response of biological central pattern generators (bCPGs). We discuss the limitations of present CPGs and appraise the advantages of analog over digital circuits for application in bioelectronic medicine. To test the system, we have focused on the cardio-respiratory oscillators in the medulla oblongata that modulate heart rate in phase with respiration to induce respiratory sinus arrhythmia (RSA). We describe here a novel, scalable hCPG comprising physiologically realistic (Hodgkin-Huxley type) neurones and synapses. Our hCPG comprises two neurones that antagonise each other to provide rhythmic motor drive to the vagus nerve to slow the heart. We show how recent advances in modelling allow the motor output to adapt to physiological feedback such as respiration. In rats, we report on the restoration of RSA using an hCPG that receives diaphragmatic electromyography input and use it to stimulate the vagus nerve at specific time points of the respiratory cycle to slow the heart rate. We have validated the adaptation of stimulation to alterations in respiratory rate. We demonstrate that the hCPG is tuneable in terms of the depth and timing of the RSA relative to respiratory phase. These pioneering studies will now permit an analysis of the physiological role of RSA as well as its any potential therapeutic use in cardiac disease.
Estimating the behavior of a network of neurons requires accurate models of the individual neurons along with accurate characterizations of the connections among them. Whereas for a single cell, measurements of the intracellular voltage are technically feasible and sufficient to characterize a useful model of its behavior, making sufficient numbers of simultaneous intracellular measurements to characterize even small networks is infeasible. This paper builds on prior work on single neurons to explore whether knowledge of the time of spiking of neurons in a network, once the nodes (neurons) have been characterized biophysically, can provide enough information to usefully constrain the functional architecture of the network: the existence of synaptic links among neurons and their strength. Using standardized voltage and synaptic gating variable waveforms associated with a spike, we demonstrate that the functional architecture of a small network of model neurons can be established.
We present a method for using measurements of membrane voltage in individual neurons to estimate the parameters and states of the voltage-gated ion channels underlying the dynamics of the neuron's behavior. Short injections of a complex time-varying current provide sufficient data to determine the reversal potentials, maximal conductances, and kinetic parameters of a diverse range of channels, representing tens of unknown parameters and many gating variables in a model of the neuron's behavior. These estimates are used to predict the response of the model at times beyond the observation window. This method of data assimilation extends to the general problem of determining model parameters and unobserved state variables from a sparse set of observations, and may be applicable to networks of neurons. We describe an exact formulation of the tasks in nonlinear data assimilation when one has noisy data, errors in the models, and incomplete information about the state of the system when observations commence. This is a high dimensional integral along the path of the model state through the observation window. In this article, a stationary path approximation to this integral, using a variational method, is described and tested employing data generated using neuronal models comprising several common channels with Hodgkin-Huxley dynamics. These numerical experiments reveal a number of practical considerations in designing stimulus currents and in determining model consistency. The tools explored here are computationally efficient and have paths to parallelization that should allow large individual neuron and network problems to be addressed.
We examine the use of synchronization as a mechanism for extracting parameter and state information from experimental systems. We focus on important aspects of this problem that have received little attention previously and we explore them using experiments and simulations with the chaotic Colpitts oscillator as an example system. We explore the impact of model imperfection on the ability to extract valid information from an experimental system. We compare two optimization methods: an initial value method and a constrained method. Each of these involves coupling the model equations to the experimental data in order to regularize the chaotic motions on the synchronization manifold. We explore both time-dependent and time-independent coupling and discuss the use of periodic impulse coupling. We also examine both optimized and fixed (or manually adjusted) coupling. For the case of an optimized time-dependent coupling function u(t) we find a robust structure which includes sharp peaks and intervals where it is zero. This structure shows a strong correlation with the location in phase space and appears to depend on noise, imperfections of the model, and the Lyapunov direction vectors. For time-independent coupling we find the counterintuitive result that often the optimal rms error in fitting the model to the data initially increases with coupling strength. Comparison of this result with that obtained using simulated data may provide one measure of model imperfection. The constrained method with time-dependent coupling appears to have benefits in synchronizing long data sets with minimal impact, while the initial value method with time-independent coupling tends to be substantially faster, more flexible, and easier to use. We also describe a method of coupling which is useful for sparse experimental data sets. Our use of the Colpitts oscillator allows us to explore in detail the case of a system with one positive Lyapunov exponent. The methods we explored are easily extended to driven systems such as neurons with time-dependent injected current. They are expected to be of value in nonchaotic systems as well. Software is available on request.