This year, I have been awarded the Hebb Award of the International Neural Network Society -- one of the two highest awards of the society, which recognizes leading contributions to understanding how learning works in real, biological brains.
Here is the abstract of the tutorial I will be giving on that subject, to an expert audience, at the International Joint Conference on Neural Networks (IJCNN11) this year. (IJCNN is the main joint meeting of INNS and of the IEEE on neural networks.)
Brain-Like Prediction, Decision and Control
This tutorial will give an updated report on progress towards building universal learning systems, which should be able to learn to make better and better predictions and decisions in the face of complexity, nonlinearity, unobserved dynamics and uncertainty as severe as what mammal brains have evolved to cope with. Some people imagine that such universal systems could not be possible, but the mammal brain is proof that it is. There is now some understanding of the mathematics which makes it possible. No one on earth has yet developed a complete design which can do this, but we do have a roadmap now for getting there. (See P.Werbos, Intelligence in the Brain: a theory of how it works and how to build it, Neural Networks, 22 (2009) 200-212., and www.werbos.com/Mind.htm.)
In 2008, NSF funded a major effort in Cognitive Optimization and Prediction (COPN) aimed at understanding and replicating these capabilities. The National Academy of Engineering has listed reverse-engineering the brain as one of the most important grand challenges in engineering for this century.
This tutorial will start with a kind of roadmap of design and research challenges and milestones, and key accomplishments to date, ranging from mathematical foundations, to the best available tools and applications, to the goal
of “optimal vector intelligence,” and to the three further steps to get to cognitive optimization and prediction systems as powerful as the basic mammal brain – with some reference to new opportunities in neuroscience.
In the prediction domain, sloppy “hands on” data mining and incorrect conventional wisdoms have led to some really serious errors, some important to the 2008 financial collapse. For time-series data or dynamic systems, like economies or engines, the best universal systems now available for nonlinear prediction (which still continue to beat the competitors) are the time-lagged recurrent network (TLRN) systems from Ford or Siemens. But truly optimal systems are not available yet even for the simple task of learning Y=f(X) for smooth vector functions f, even when learning from a fixed database. I will discuss how we could build on the classic work of Barron (still the best available for this task) to achieve this. It is important for research to find optimal ways to insert penalty functions, robustness and a better use of example-based methods into (neural net) models of f, rather than rely on extreme methods which do not learn about general cause-and-effect relations. This in turn allows development of universal time-series prediction tools even more powerful than what Ford and Siemens now offer. Beyond the vector intelligence level, the recent breakthroughs of LeCun, of Fogel, and of Kozma and myself offer pathways to coping with spatial complexity. For example, under COPN funding, LeCun has broken world records in object recognition, phoneme recognition and language processing, using simple neural networks based on simple mathematical principles which we can easily take further.
In decision and control, adaptive dynamic programming (ADP) has made huge progress in academia lately, but divisions between communities have missed some important opportunities for further and faster progress in research. For example, use of a prediction and state estimation module, and use of ADP’s original capabilities to cope with random disturbances, are essential to coping with many complex challenges, and to understanding brain capabilities, in the control world. In the operations research world, neural networks offer universal function approximation capabilities essential to better representation of value functions, the key bottleneck there. Many practical applications started in the 1990’s – especially in aerospace, beyond the scope of conventional control or AI methods – are now seeing widespread large benefits, not so well known in academia, in part because of proprietary issues. Multiple time intervals are one of the key steps up from optimal vector intelligence here to true brain-like intelligence; as time permits, I will briefly review Wunsch’s new results on ADP with multiple time-scales and prior work related to this goal, and ways to use ADP in “smart grid” research.
All these designs require generalized backpropagation for efficient implementation on massively parallel computing platforms, such as CNN or memristor. Because there is still widespread misunderstanding of backpropagation in some circles, and failure to take advantage of its full power, I will briefly review the chain rule for ordered derivatives and how it fits here.