Friday, January 8, 2010

misconceptions about backpropagation and brain

A friend recently passed on a paper which commented on backpropagation
(see www.werbos.com/Mind#htm) and the brain. Since there is SO much
misinformation out there, it's probably worth posting some comment on the supposedly learned comment....

==========================================

I summarized:

There is a HUGE amount of miscommunication between disciplines on this and many other topics. It's a bit like the wars between Republicans and Democrats which we
need to deal with and adapt to, like it or not.
Most of the people who say these things about backpropagation in the brain/psych world
do not even know what backpropagation actually is.
==================

My friend said:

Just ran across a paragraph in a long paper on other subject that walked into your territory, thought you might like to see it:

Critique of Back-Propagation :
The method of backwards calculation of weights is not biologically plausible. Thus, it cannot be viewed as a model for learning in biological systems, but as a method to design a network with learning features.
Second, the algorithm uses a digital computer to calculate weights. When the final network is implemented in hardware, however, it has lost its plasticity due to preset weights.
--------------------------------------------------------------------------
That's really stupid. Both analog and digital neural net hardware exists with
adaptable weights of various types. 'Way back in 1994, Caulfield (an optical computing guy)
and I published a joint paper on how to do online learning with adaptable hardware weights,
which works in a practical way even with the simple forms of backpropagation that
untrained psychologists know about.
There was a period of lull on the neural net hardware front for a few years. But
on February 3-5, Luda and I are scheduled to go to a conference CNNA 2010
(an easy google) where 680 people are coming, dedicated ENTIRELY to neural network hardware, which is now finally getting to be recognized as the next big wave.
(But I will probably be a day or two late, because I'm supposed to speak at Herzliya 2010,
another conference you might enjoy googling -- especially the program.)
Now if the Pope says that no such chips exist, and Chua and XILINX say they are actually selling them, who do you believe? The Pope has a higher citation index,
so of course the chips do not exist -- if you believe in hermeneutics.
As many now do, as memories of Francis Bacon fade even in large parts of science.
If the humans believe more and more in hermeneutics, perhaps humans will conclude that computers do not exist and perhaps the chips will conclude that the humans need not exist.
I hope not, but things are indeed looking sticky lately.
---------------------------------------------------------------
The commenter went on:
This loss is in contrast with the initial motivation to devlop neural networks that are biologically inspired. If changes are needed, a computer calculates anew the weight values and updates them. This dependence on a digital computer robs the neural network implementation of its spontaneous ability to restructure itself in response to experience and all the complex dynamics that goes with it.
============================================
My response:

Again, this assertion is mainly a matter of ignorance.
In more sophisticated neural network designs, backpropagation is not EVERYTHING!
It is an essential PART of the system.
Rewiring in the brain and rewiring in hardware are complex and important issues.
In fact, in a general system at the level of human brain intelligence or lower, there
are three critical issues at the heart of intelligence: (1) how the "weights" (synapse strengths
and intraneuronal parameters) are adapted or updated, on a continuum of nonzero values;
(2) how they may be set to zero ("nulled out") or reset; (3) how physical rewiring is decided.
In all cases, the important issue is not the physical mechanism (though that's
a key part to figuring out what we really want to know), but the functional
or computational basis for deciding to invoke (with what parameters) the physical mechanisms.
With or without rewiring, it is essential to do and understand both (1) and (2)
to understand normal brain intelligence, as well as to replicate brain-like intelligence
in computers. Backpropagation is an essential part of that. Thus the critic's argument here is a non sequitur.

Just for your edification -- it is unclear whether (3) is really significant in the vast majority of humans above the age of two. Until a few years ago, it was believed that no new higher neurons are created in the brains of humans above the age of two; now we know better, but
the survival and use of the new neurons (sent swimming through the blood from generic progenitor cells without higher intelligence) depends on factors probably calculated as part of the same system which performs (1) and (2), which needs feedback to do its job.
BUT AGAIN: I add these later comments just for your edification. I would say more,
but maybe this is not the right context.)
--------------------------------------------
The commenter went on:

Moreover, despite its popularity, the algorithm suffers from extensive calculations and, hence, slow training speed.

===================
My response:

When some folks try to turn on computers, it may sometimes take takes centuries for them to boot up. That's them.
It is true that valid generalization from complex experience takes a lot of time
even when it is done well, by any means whatsoever. Even human brains sometimes take years to learn the full implications of their life's experience. But for ordinary engineering applications, fast learning methods with feedback have been developed. In operations research (which knows more about these converegnce speed issues than any other disciplines), it would be seen
as a sign of pathetic ignorance or idealistic fantasy to imagine that convergence could be faster without derivatives than with them, if derivatives are used in an intelligent way.
But still, no one on earth can claim to have the optimal design as yet for making
OPTIMAL use of the information we get form backpropagation and from other sources.
(Those who do not even understand this challenge are not likely to contribute to it.)
The time required to calculate the error derivatives and to update the weights on a given training exemplar is proportional to the size of the network. The amount of computation is proportional to the number of weights.
This is again really stupid and ignorant, as a general statement. If neural network designs are emulated on a digital computer, the first sentence is generally true. Thus with a billion neurons
and 10,000 weights each, it would cost on the order of 10 trilllion calculations to calculate all
the derivatives. But it would cost about 10 trillion calculations to run the network itself from its inputs to its outputs.
The reason why brains and brain-like hardware are possible is that they use massively parallel
calculation. I can understand how someone in another field might be unaware of my own work,
but even a sincere high school kid wouldn't make such strong statements about backpropagation without knowing there exists a book called "parallel distributed processing," whose title should make it obvious what's going on.
In fact, results in the lab of Barry Richmond of NIH show that it takes about
30 milliseconds for the cerebral cortex to make a forward pass calculation from
its inputs to its outputs. It takes about the same to make the backwards pass it also
makes in every alpha time interval (anout one-eight of a second). The timing
is no problem. Backpropagation is a parallel calculation just as much as
involking a neural network itself.
===============
The commenter said:

In 1986 Sejnowski and Rosenberg described a simple back-propagation network simulation that reads and properly pronounces unrestricted English text with good accuracy. They named their system NETtalk.Large computer programs exist that accomplish the same task, but these programs, which take many hundreds of hours to write, painstakingly encode each English pronunciation rule and provide lookup tables to cover the many exceptions in the language. The NETtalk simulation program took only a short time to write and contains no explicit pronunciation rules or lookup tables.()
---------------------------------

My response:

I have to admit that I do not know anything about the literature on
practical technology products which try to do what NETtalk did.
Thus I have no basis for verifying or contradicting the claims in the paragraph abovem, except perhaps by analogy with other areas. Certainly I know
of areas where one group used traditional painstaking methods over years to perform
the same task that was performed very quickly (either before or after) by use
of a neural network. But there is also a great diversity of people using neural networks, some trained and some less trained, and some trying hard and others trying to get bad results.
At www.werbos.com/oil.htm, I posted some slides which included examples of applications where people spent MANY millions or billions of dollars using a wide variety of traditional, AI and control theory methods on an important application... where a well-crafted neural network system substantially outperformed them all. Those are for decision applications.
For "prediction" -- at www.werbos.com/Mind.htm#Mouse, I posted some recent slides given at Chile.
It is still a major challenge here how to get the word out, and upgrade the educational
systems.
Best regards,
Paul