The Project Gutenberg eBook of Self-Organizing Systems, 1963

This ebook is for the use of anyone anywhere in the United States and most other parts of the world at no cost and with almost no restrictions whatsoever. You may copy it, give it away or re-use it under the terms of the Project Gutenberg License included with this ebook or online at www.gutenberg.org. If you are not located in the United States, you will have to check the laws of the country where you are located before using this eBook.

Title: Self-Organizing Systems, 1963

Editor: James Emmett Garvey

Release date: September 13, 2021 [eBook #66286]
Most recently updated: October 18, 2024

Language: English

Credits: Mark C. Orton and the Online Distributed Proofreading Team at https://www.pgdp.net

*** START OF THE PROJECT GUTENBERG EBOOK SELF-ORGANIZING SYSTEMS, 1963 ***

SELF-ORGANIZING SYSTEMS
1963

Edited By

JAMES EMMETT GARVEY

Office of Naval Research
Pasadena, California

ACR-96

OFFICE OF NAVAL RESEARCH
DEPARTMENT OF THE NAVY
WASHINGTON, D.C.

For sale by the Superintendent of Documents.
U.S. Government Printing Office
Washington, D.C., 20402—Price $1.50


CONTENTS

Foreword iv
The Ionic Hypothesis and Neuron Models  1
—E. R. Lewis  
Fields and Waves in Excitable Cellular Structures 19
—R. M. Stewart  
Multi-Layer Learning Networks 37
—R. A. Stafford  
Adaptive Detection of Unknown Binary Waveforms   46
—J. J. Spilker, Jr.  
Conceptual Design of Self-Organizing Machines 52
—P. A. Kleyn  
A Topological Foundation for Self-Organization 65
—R. I. Ścibor-Marchocki  
On Functional Neuron Modeling 71
—C. E. Hendrix  
Selection of Parameters for Neural Net Simulations 76
—R. K. Overton  
Index of Invited Participants 77

[Pg iv]

FOREWORD

The papers appearing in this volume were presented at a Symposium on Self-Organizing Systems, which was sponsored by the Office of Naval Research and held at the California Institute of Technology, Pasadena, California, on 14 November 1963. The Symposium was organized with the aim of providing a critical forum for the presentation and discussion of contemporary significant research efforts, with the emphasis on relatively uncommon approaches and methods in an early state of development. This aim and nature dictated that the Symposium be in effect a Working Group, with numerically limited invitational participation.

The papers which were presented and discussed did in fact serve to introduce several relatively unknown approaches; some of the speakers were promising young scientists, others had become known for contributions in different fields and were as yet unrecognized for their recent work in self-organization. In addition, the papers as a collection provided a particularly broad, cross-disciplinary spectrum of investigations which possessed intrinsic value as a portrayal of the bases upon which this new discipline rests. Accordingly, it became obvious in retrospect that the information presented and discussed at the Symposium was of considerable interest—and should thus receive commensurate dissemination—to a much broader group of scientists and engineers than those who were able to participate directly in the meeting itself. This volume is the result of that observation; as an edited collection of the papers presented at the Symposium, it forms the Proceedings thereof. If it provides a useful reference for present and future investigators, as well as documenting the source of several new approaches, it will have fulfilled its intended purpose well.

A Symposium which takes the nature of a Working Group depends for its utility especially upon effective commentary and critical analysis, and we commend all the participants for their contributions in this regard. It is appropriate, further, to acknowledge the contributions to the success of the Symposium made by the following: The California Institute of Technology for volunteering to act as host and for numerous supporting services; Professor Gilbert D. McCann, Director of the Willis Booth Computing Center at the California Institute of Technology, and the members of the technical and secretarial staffs of the Computing Center, who assumed the responsibility of acting as the immediate representatives of the Institute; the members of the Program Committee, who organized and led the separate sessions—Harold Hamilton [Pg v] of General Precision, Joseph Hawkins of Ford Motor Company, Robert Stewart of Space-General, Peter Kleyn of Northrop, and Professor McCann; members of the Technical Information Division of the Naval Research Laboratory, who published these Proceedings; and especially the authors of the papers, which comprised the heart of the Symposium and subsequently formed this volume. To all of these the sponsors wish to express their very sincere appreciation.

James Emmett Garvey 
Office of Naval Research Branch Office
Pasadena, California 

Margo A. Sass 
Office of Naval Research 
Washington, D.C. 


[Pg 1]

The Ionic Hypothesis and Neuron Models

E. R. Lewis

Librascope Group, General Precision, Inc.
Research and Systems Center
Glendale, California

The measurements of Hodgkin and Huxley were aimed at revealing the mechanism of generation and propagation of the all-or-none spike. Their results led to the Modern Ionic Hypothesis. Since the publication of their papers in 1952, advanced techniques with microelectrodes have led to the discovery of many modes of subthreshold activity not only in the axon but also in the somata and dendrites of neurons. This activity includes synaptic potentials, local response potentials, and pacemaker potentials.

We considered the question, “Can this activity also be explained in terms of the Hodgkin-Huxley Model?” To seek an answer, we have constructed an electronic analog based on the ionic hypothesis and designed around the data of Hodgkin and Huxley. Synaptic inputs were simulated by simple first-order or second-order networks connected directly to simulated conductances (potassium or sodium). The analog has, with slight parameter adjustments, produced all modes of threshold and subthreshold activity.

INTRODUCTION

In recent years physiologists have become quite adept at probing into neurons with intracellular microelectrodes. They are now able, in fact, to measure (a) the voltage change across the postsynaptic membrane elicited by a single presynaptic impulse (see, for examples, references 1 and 2) and (b) the voltage-current characteristics across a localized region of the nerve cell membrane (3), (4), (5), (6). With microelectrodes, physiologists have been able to examine not only the all-or-none spike generating and propagating properties of axons but also the electrical properties of somatic and dendritic structures in individual neurons. The resulting observations have led many physiologists to believe that the individual nerve cell is a potentially complex information-processing system far removed from the simple two-state device envisioned by many early modelers. This new concept of the neuron is well summarized by Bullock in his 1959 [Pg 2] Science article (10). In the light of recent physiological literature, one cannot justifiably omit the diverse forms of somatic and dendritic behavior when assessing the information-processing capabilities of single neurons. This is true regardless of the means of assessment—whether one uses mathematical idealizations, electrochemical models, or electronic analogs. We have been interested specifically in electronic analogs of the neuron; and in view of the widely diversified behavior which we must simulate, our first goal has been to find a unifying concept about which to design our analogs. We believe we have found such a concept in the Modern Ionic Hypothesis, and in this paper we will discuss an electronic analog of the neuron which was based on this hypothesis and which simulated not only the properties of the axon but also the various subthreshold properties of the somata and dendrites of neurons.

We begin with a brief summary of the various types of subthreshold activity which have been observed in the somatic and dendritic structures of neurons. This is followed by a brief discussion of the Hodgkin-Huxley data and of the Modern Ionic Hypothesis. An electronic analog based on the Hodgkin-Huxley data is then introduced, and we show how this analog can be used to provide all of the various types of somatic and dendritic activity.

SUBTHRESHOLD ELECTRICAL ACTIVITY
IN NEURONS

In studying the recent literature in neurophysiology, one is immediately struck by the diversity in form of both elicited and spontaneous electrical activity in the single nerve cell. This applies not only to the temporal patterns of all-or-none action potentials but also to the graded somatic and dendritic potentials. The synaptic membrane of a neuron, for example, is often found to be electrically inexcitable and thus incapable of producing an action potential; yet the graded, synaptically induced potentials show an amazing diversity in form. In response to a presynaptic impulse, the postsynaptic membrane may become hyperpolarized (inhibitory postsynaptic potential), depolarized (excitatory postsynaptic potential), or remain at the resting potential but with an increased permeability to certain ions (a form of inhibition). The form of the postsynaptic potential in response to an isolated presynaptic spike may vary from synapse to synapse in several ways, as shown in Figure 1. Following a presynaptic spike, the postsynaptic potential typically rises with some delay to a peak value and then falls back toward the equilibrium or resting potential. Three potentially important factors are the delay time (synaptic delay), the peak amplitude (spatial weighting of synapse), and the rate of fall toward the equilibrium potential (temporal weighting of synapse). The responses of a synapse to individual spikes in a volley may be progressively enhanced (facilitation), diminished (antifacilitation), or neither (1), (2), (7), (8). Facilitation may be in the form of progressively increased peak amplitude, or in the form of progressively decreased rate of fall (see Figure 2). The time course and magnitude of facilitation or antifacilitation may very well be important synaptic parameters. In addition, the postsynaptic membrane sometimes exhibits excitatory or inhibitory aftereffects (or both) on cessation of a volley of presynaptic spikes (2), (7); and the time course and magnitude of the aftereffects may be important parameters. Clearly, even if one considers the synaptic potentials alone, he is faced with an impressive variety of responses. Examples of the various types of postsynaptic responses may be found in the literature, but for purposes of the present discussion the idealized wave forms in Figure 2 will demonstrate the diversity of electrical behavior with which one is faced. [Pg 3]

Figure 1—Excitatory postsynaptic potentials in response to a single presynaptic spike


[Pg 4]

Figure 2—Idealized postsynaptic potentials

[Pg 5] In addition to synaptically induced potentials, low-frequency, spontaneous potential fluctuations have been observed in many neurons (2), (7), (9), (10), (11). These fluctuations, generally referred to as pacemaker potentials, are usually rhythmic and may be undulatory or more nearly saw-toothed in form. The depolarizing phase may be accompanied by a spike, a volley of spikes, or no spikes at all. Pacemaker frequencies have been noted from ten or more cycles per second down to one cycle every ten seconds or more. Some idealized pacemaker wave forms are shown in Figure 3.

Figure 3—Idealized pacemaker potentials


[Pg 6]

Figure 4—Graded response

Bullock (7), (10), (12), (13) has demonstrated the existence of a third type of subthreshold response, which he calls the graded response. While the postsynaptic membrane is quite often electrically inexcitable, other regions of the somatic and dendritic membranes appear to be moderately excitable. It is in these regions that Bullock observes the graded response. If one applies a series of pulsed voltage stimuli to the graded-response region, the observed responses would be similar to those shown in Figure 4A. Plotting the peak response voltage as a function of the stimulus voltage would result in a curve similar to that in Figure 4B (see Ref. 3, page 4). [Pg 7] For small values of input voltage, the response curve is linear; the membrane is passive. As the stimulus voltage is increased, however, the response becomes more and more disproportionate. The membrane is actively amplifying the stimulus potential. At even higher values of stimulus potential, the system becomes regenerative; and a full action potential results. The peak amplitude of the response depends on the duration of the stimulus as well as on the amplitude. It also depends on the rate of application of the stimulus voltage. If the stimulus potential is a voltage ramp, for example, the response will depend on the slope of the ramp. If the rate of rise is sufficiently low, the membrane will respond in a passive manner to voltages much greater than the spike threshold for suddenly applied voltages. In other words, the graded-response regions appear to accommodate to slowly varying potentials.

In terms of functional operation, we can think of the synapse as a transducer. The input to this transducer is a spike or series of spikes in the presynaptic axon. The output is an accumulative, long-lasting potential which in some way (perhaps not uniquely) represents the pattern of presynaptic spikes. The pacemaker appears to perform the function of a clock, producing periodic spikes or spike bursts or producing periodic changes in the over-all excitability of the neuron. The graded-response regions appear to act as nonlinear amplifiers and, occasionally, spike initiators. The net result of this electrical activity is transformed into a series of spikes which originate at spike initiation sites and are propagated along axons to other neurons. The electrical activity in the neuron described above is summarized in the following outline (taken in part from Bullock (7)):

[Pg 8]

THE MODERN IONIC HYPOTHESIS

Hodgkin, Huxley, and Katz (3) and Hodgkin and Huxley (14), (15), (16), in 1952, published a series of papers describing detailed measurements of voltage, current, and time relationships in the giant axon of the squid (Loligo). Hodgkin and Huxley (17) consolidated and formalized these data into a set of simultaneous differential equations describing the hypothetical time course of events during spike generation and propagation. The hypothetical system which these equations describe is the basis of the Modern Ionic Hypothesis.

The system proposed by Hodgkin and Huxley is basically one of dynamic opposition of ionic fluxes across the axon membrane. The membrane itself forms the boundary between two liquid phases—the intracellular fluid and the extracellular fluid. The intracellular fluid is rich in potassium ions and immobile organic anions, while the extracellular fluid contains an abundance of sodium ions and chloride ions. The membrane is slightly permeable to the potassium, sodium, and chloride ions; so these ions tend to diffuse across the membrane. When the axon is inactive (not propagating a spike), the membrane is much more permeable to chloride and potassium ions than it is to sodium ions. In this state, in fact, sodium ions are actively transported from the inside of the membrane to the outside at a rate just sufficient to balance the inward leakage. The relative sodium ion concentrations on both sides of the membrane are thus fixed by the active transport rate, and the net sodium flux across the membrane is effectively zero. The potassium ions, on the other hand, tend to move out of the cell; while chloride ions tend to move into it. The inside of the cell thus becomes negative with respect to the outside. When the potential across the membrane is sufficient to balance the inward diffusion of chloride with an equal outward drift, and the outward diffusion of potassium with an inward drift (and possibly an inward active exchange), equilibrium is established. The equilibrium potential is normally in the range of 60 to 65 millivolts.

The resting neural membrane is thus polarized, with the inside approximately 60 millivolts negative with respect to the outside. Most of the Hodgkin-Huxley data is based on measurements of the transmembrane current in response to an imposed stepwise reduction (depolarization) of membrane potential. By varying the external ion concentrations, Hodgkin and Huxley were able to resolve the transmembrane current into two “active” components, the potassium ion current and the sodium ion current. They found that while the membrane [Pg 9] permeabilities to chloride and most other inorganic ions were relatively constant, the permeabilities to both potassium and sodium were strongly dependent on membrane potential. In response to a suddenly applied (step) depolarization, the sodium permeability rises rapidly to a peak and then declines exponentially to a steady value. The potassium permeability, on the other hand, rises with considerable delay to a value which is maintained as long as the membrane remains depolarized. The magnitudes of both the potassium and the sodium permeabilities increase monotonically with increasing depolarization. A small imposed depolarization will result in an immediately increased sodium permeability. The resulting increased influx of sodium ions results in further depolarization; and the process becomes regenerative, producing the all-or-none action potential. At the peak of the action potential, the sodium conductance begins to decline, while the delayed potassium conductance is increasing. Recovery is brought about by an efflux of potassium ions, and both ionic permeabilities fall rapidly as the membrane is repolarized. The potassium permeability, however, falls less rapidly than that of sodium. This is basically the explanation of the all-or-none spike according to the Modern Ionic Hypothesis.

Figure 5—Hodgkin-Huxley representation of small area of axon membrane

Figure 6—Typical responses of sodium conductance and potassium conductance to imposed step depolarization

By defining the net driving force on any given ion species as the difference between the membrane potential and the equilibrium potential for that ion and describing permeability changes in terms of equivalent electrical conductance changes, Hodgkin and Huxley reduced the ionic [Pg 10] model to the electrical equivalent in Figure 5. The important dynamic variables in this equivalent network are the sodium conductance (G{Na}) and the potassium conductance (G{K}). The change in the sodium conductance in response to a step depolarization is shown in Figure 6B. This change can be characterized by seven voltage dependent parameters: [Pg 11]

1. Delay time—generally much less than 1 msec

2. Rise time—1 msec or less

3. Magnitude of peak conductance—increases monotonically with increasing depolarization

4. Inactivation time constant—decreases monotonically with increasing depolarization.

5. Time constant of recovery from inactivation—incomplete data

6. Magnitude of steady-state conductance—increases monotonically with increasing depolarization

7. Fall time on sudden repolarization—less than 1 msec.

Figure 6B shows the potassium conductance change in response to an imposed step depolarization. Four parameters are sufficient to characterize this response:

1. Delay time—decreases monotonically with increasing depolarization

2. Rise time—decreases monotonically with increasing depolarization

3. Magnitude of steady-state conductance—increases monotonically with increasing depolarization

4. Fall time on sudden repolarization—8 msec or more, decreases slightly with increasing depolarization.

In addition to the aforementioned parameters, the transient portion of the sodium conductance appears to exhibit an accommodation to slowly varying membrane potentials. The time constants of accommodation appear to be those of inactivation or recovery from inactivation—depending on the direction of change in the membrane potential (18). The remaining elements in the Hodgkin-Huxley model are constant and are listed below:

ELECTRONIC SIMULATION OF THE HODGKIN-HUXLEY MODEL

Figure 7—System diagram for electronic simulation of the Hodgkin-Huxley model

Given a suitable means of generating the conductance functions, GNa(v,t) and GK(v,t), one can readily stimulate the essential aspects of the Modern Ionic Hypothesis. If we wish to do this electronically, we have two problems. First, we must synthesize a network whose input is the membrane potential and whose output is a [Pg 12] voltage or current proportional to the desired conductance function. Second, we must transform the output from a voltage or current to an effective electronic conductance. The former implies the need for nonlinear, active filters, while the latter implies the need for multipliers. The basic block diagram is shown in Figure 7. Several distinct realizations of this system have been developed in our laboratory, and in each case the results were the same. With parameters adjusted to closely match the data of Hodgkin and Huxley, the electronic model exhibits all of the important properties of the axon. It produces spikes of 1 to 2 msec duration with a threshold of approximately 5% to 10% of the spike amplitude. The applied stimulus is [Pg 13] generally followed by a prepotential, then an active rise of less than 1 msec, followed by an active recovery. The after-depolarization generally lasts several msec, followed by a prolonged after-hyperpolarization. The model exhibits the typical strength-duration curve, with rheobase of 5% to 10% of the spike amplitude. For sufficiently prolonged sodium inactivation (long time constant of recovery from inactivation), the model also exhibits an effect identical to classical Wedensky inhibition (18). Thus, as would be expected, the electronic model simulates very well the electrical properties of the axon.

In addition to the axon properties, however, the electronic model is able to reproduce all of the somatic and dendritic activity outlined in the section on subthreshold activity. Simulation of the pacemaker and graded-response potentials is accomplished without additional circuitry. In the case of synaptically induced potentials, however, auxiliary networks are required. These networks provide additive terms to the variable conductances in accordance with current notions on synaptic transmission (19). Two types of networks have been used. In both, the inputs are simulated presynaptic spikes, and in both the outputs are the resulting simulated chemical transmitter concentration. In both, the transmitter substance was assumed to be injected at a constant rate during a presynaptic spike and subsequently inactivated in the presence of an enzyme. One network simulates a first-order chemical reaction, where the enzyme concentration is effectively constant. The other simulates a second-order chemical reaction, where the enzyme concentration is assumed to be reduced during the inactivation process. For simulation of an excitatory synapse, the output of the auxiliary network is added directly to GNa in the electronic model. For inhibition, it is added to GK. With the parameters of the electronic membrane model set at the values measured by Hodgkin and Huxley, we have attempted to simulate synaptic activity with the aid of the two types of auxiliary networks. In the case of the simulated first-order reaction, the excitatory synapse exhibits facilitation, antifacilitation, or neither—depending on the setting of a single parameter, the transmitter inactivation rate (i.e., the effective enzyme concentration). This parameter would appear, in passing, to be one of the most probable synaptic variables. In this case, the mechanisms for facilitation and antifacilitation are contained in the simulated postsynaptic membrane. Facilitation is due to the nonlinear dependence of GNa on membrane potential, while antifacilitation is due to inactivation of GNa. The occurrence of one form of response or the other is determined by the relative importance of the two mechanisms (18). Grundfest (20) has mentioned [Pg 14] both of these mechanisms as potentially facilitory and antifacilitory, respectively. The simulated inhibitory synapse with the first order input is capable of facilitation (18), but no antifacilitation has been observed. Again, the presence or absence of facilitation is determined by the inactivation rate.

With the simulated second-order reaction, both excitatory and inhibitory synapses exhibit facilitation. In this case, two facilitory mechanisms are present—one in the postsynaptic membrane and one in the nonconstant transmitter inactivation reaction. The active membrane currents can, in fact, be removed; and this system will still exhibit facilitation. With the second-order auxiliary network, the presence of excitatory facilitation, antifacilitation, or neither depends on the initial, or resting, transmitter inactivation rate. The synaptic behavior also depends parametrically on the simulated enzyme reactivation rate. Inhibitory antifacilitation can be introduced with either type of auxiliary network by limiting the simulated presynaptic transmitter supply.

Certain classes of aftereffects are inherent in the mechanisms of the Ionic Hypothesis. In the electronic model, aftereffects are observed following presynaptic volleys with either type of auxiliary network. Following a volley of spikes into the simulated excitatory synapse, for example, rebound hyperpolarization may or may not occur depending on the simulated transmitter inactivation rate. If the inactivation rate is sufficiently high, rebound will occur. This rebound can be monophasic (inhibitory phase only) or polyphasic (successive cycles of excitation and inhibition). Following a volley of spikes into the simulated inhibitory synapse, rebound depolarization may or may not occur depending on the simulated transmitter inactivation rate. This rebound can also be monophasic or polyphasic. Sustained postexcitatory depolarization and sustained postinhibitory hyperpolarization (2) have been achieved in the model by making the transmitter inactivation rate sufficiently low.

The general forms of the postsynaptic potentials simulated with the electronic model are strikingly similar to those published in the literature for real neurons. The first-order auxiliary network produces facilitation of a form almost identical to that shown by Otani and Bullock (8) while the second-order auxiliary network produces facilitation of the type shown by Chalazonitis and Arvanitake (2). The excitatory antifacilitation is almost identical to that shown by Hagiwara and Bullock (1) in both form and dependence on presynaptic spike frequency. In every case, the synaptic behavior is determined by the effective rate of transmitter inactivation, which in real neurons [Pg 15] would presumably be directly proportional to the effective concentration of inactivating enzyme at the synapse.

Pacemaker potentials are easily simulated with the electronic model without the use of auxiliary networks. This is achieved either by inserting a large, variable shunt resistor across the simulated membrane (see Figure 5) or by allowing a small sodium current leakage at the resting potential. With the remaining parameters of the model set as close as possible to the values determined by Hodgkin and Huxley, the leakage current induces low-frequency, spontaneous spiking. The spike frequency increases monotonically with increasing leakage current. In addition, if the sodium conductance inactivation is allowed to accumulate over several spikes, periodic spike pairs and spike bursts will result. Subthreshold pacemaker potentials have also been observed in the model, but with parameter values set close to the Hodgkin-Huxley data these are generally higher in frequency than pacemaker potentials in real neurons. It is interesting that a pacemaker mode may exist in the absence of the simulated sodium conductance. It is a very high-frequency mode (50 cps or more) and results from the alternating dominance of potassium current and chloride (or leakage ion) current in determining the membrane potential. The significance of this mode cannot be assessed until better data is available for the potassium conductance at low levels of depolarization in real neurons. In general, as far as the model is concerned, pacemaker potentials are possible because the potassium conductance is delayed in both its rise with depolarization and its fall with repolarization.

Rate sensitive graded response has also been observed in the electronic model. The rate sensitivity—or accommodation—is due to the sodium conductance inactivation. The response of the model to an imposed ramp depolarization was discussed in Reference 18. At this time, several alternative model parameters could be altered to bring about reduced electrical excitability. None of the parameter changes was very satisfying, however, because none of them was in any way justified by physiological data. We have since found that the membrane capacitance, a plausible parameter in view of recent physiological findings, can completely determine the electrical excitability. Thus, with the capacitance determined by Hodgkin and Huxley (1 microfarad per cm²), the model exhibits excitability characteristic of the axon. As the capacitance is increased, the model becomes less excitable until, with 10 or 12 μμf, it is effectively inexcitable. Thus, with an increased [Pg 16] capacitance—but with all the remaining parameters set as close as possible to the Hodgkin-Huxley values—the electronic model exhibits the characteristics of Bullock’s graded-response regions.

Whether membrane capacitance is the determining factor in real neurons is, of course, a matter of speculation. Quite a controversy is raging over membrane capacity measurements (see Rall (21)), but the evidence indicates that the capacity in the soma is considerably greater than that in the axon (6), (22).

It should be added that increasing the capacitance until the membrane model becomes inexcitable has little effect on the variety of available simulated synaptic responses. Facilitation, antifacilitation, and rebound are still present and still depend on the transmitter inactivation rate. Thus, in the model, we can have a truly inexcitable membrane which nevertheless utilizes the active membrane conductances to provide facilitation or antifacilitation, and rebound. The simulated subthreshold pacemaker potentials are much more realistic with the increased capacitance, being lower in frequency and more natural in form.

In one case, the electronic model predicted behavior which was subsequently reported in real neurons. This was in respect to the interaction of synaptic potentials and pacemaker potential. It was noted in early experiments that when the model was set in a pacemaker mode, and periodic spikes were applied to the simulated inhibitory synapse, the pacemaker frequency could be modified; and, in fact, it would tend to lock on to the stimulus frequency. This produced a paradoxical effect whereby the frequency of spontaneous spikes was actually increased by increasing the frequency of inhibitory synaptic stimuli. At very low stimulus frequencies, the spontaneous pacemaker frequency was not appreciably perturbed. As the stimulus frequency was increased, and approached the basic pacemaker frequency, the latter tended to lock on and follow further increases in the stimulus frequency. When the stimulus frequency became too high for the pacemaker to follow, the latter decreased abruptly in frequency and locked on to the first subharmonic. As the stimulus frequency was further increased, the pacemaker frequency would increase, then skip to the next harmonic, then increase again, etc. This type of behavior was observed by Moore et al. (23) in Aplysia and reported at the San Diego Symposium for Biomedical Electronics shortly after it was observed by the author in the electronic model.

Thus, we have shown that an electronic analog with all parameters except membrane capacitance fixed at values close to those of Hodgkin and Huxley, can provide all of the normal threshold or axonal behavior [Pg 17] and also all of the subthreshold somatic and dendritic behavior outlined on page 7. Whether or not this is of physiological significance, it certainly provides a unifying basis for construction of electronic neural analogs. Simple circuits, based on the Hodgkin-Huxley model and providing all of the aforementioned behavior, have been constructed with ten or fewer inexpensive transistors with a normal complement of associated circuitry (18). In the near future we hope to utilize several models of this type to help assess the information-processing capabilities not only of individual neurons but also of small groups or networks of neurons.

REFERENCES

1. Hagiwara, S., and Bullock, T. H.
  “Intracellular Potentials in Pacemaker and Integrative Neurons of the Lobster Cardiac Ganglion,”
  J. Cell and Comp. Physiol. 50 (No. 1):25-48 (1957)
2. Chalazonitis, N., and Arvanitaki, A.,
  “Slow Changes during and following Repetitive Synaptic Activation in Ganglion Nerve Cells,”
  Bull. Inst. Oceanogr. Monaco No. 1225:1-23 (1961)
3. Hodgkin, A. L., Huxley, A. F., and Katz, B.,
  “Measurement of Current-Voltage Relations in the Membrane of the Giant Axon of Loligo,”
  J. Physiol. 116:424-448 (1952)
4. Hagiwara, S., and Saito, N.,
  “Voltage-Current Relations in Nerve Cell Membrane of Onchidium verruculatum,”
  J. Physiol. 148:161-179 (1959)
5. Hagiwara, S., and Saito, N.,
  “Membrane Potential Change and Membrane Current in Supramedullary Nerve Cell of Puffer,”
  J. Neurophysiol. 22:204-221 (1959)
6. Hagiwara, S.,
  “Current-Voltage Relations of Nerve Cell Membrane,”
  “Electrical Activity of Single Cells,”
  Igakushoin, Hongo, Tokyo (1960)
7. Bullock, T. H.,
  “Parameters of Integrative Action of the Nervous System at the Neuronal Level,”
  Experimental Cell Research Suppl. 5:323-337 (1958)
8. Otani, T., and Bullock, T. H.,
  “Effects of Presetting the Membrane Potential of the Soma of Spontaneous and Integrating Ganglion Cells,”
  Physiological Zoology 32 (No. 2):104-114 (1959)
9. Bullock, T. H., and Terzuolo, C. A.,
  “Diverse Forms of Activity in the Somata of Spontaneous and Integrating Ganglion Cells,”
  J. Physiol. 138:343-364 (1957)
10. Bullock, T. H.,
  “Neuron Doctrine and Electrophysiology,”
  Science 129 (No. 3355):997-1002 (1959)
11. Chalazonitis, N., and Arvanitaki, A.,
  “Slow Waves and Associated Spiking in Nerve Cells of Aplysia,”
  Bull. Inst. Oceanogr. Monaco No. 1224:1-15 (1961)
12. Bullock, T. H.,
  “Properties of a Single Synapse in the Stellate Ganglion of Squid,”
  J. Neurophysiol. 11:343-364 (1948)
13. Bullock, T. H.,
  “Neuronal Integrative Mechanisms,”
  “Recent Advances in Invertebrate Physiology,”
  Scheer, B. T., ed., Eugene, Oregon:Univ. Oregon Press 1957
14. Hodgkin, A. L., and Huxley, A. F.,
  “Currents Carried by Sodium and Potassium Ions through the Membrane of the Giant Axon of Loligo,”
  J. Physiol. 116:449-472 (1952)
15. Hodgkin, A. L., and Huxley, A. F.,
  “The Components of Membrane Conductance in the Giant Axon of Loligo,”
[Pg 18] J. Physiol. 116:473-496 (1952)
16. Hodgkin, A. L., and Huxley, A. F.,
  “The Dual Effect of Membrane Potential on Sodium Conductance in the Giant Axon of Loligo,”
  J. Physiol. 116:497-506 (1952)
17. Hodgkin, A. L., and Huxley, A. F.,
  “A Quantitative Description of Membrane Current and its Application to Conduction and Excitation in Nerve,”
  J. Physiol. 117:500-544 (1952)
18. Lewis, E. R.,
  “An Electronic Analog of the Neuron Based on the Dynamics of Potassium and Sodium Ion Fluxes,”
  “Neural Theory and Modeling,”
  R. F. Reiss, ed., Palo Alto, California:Stanford University Press, 1964
19. Eccles, J. C.,
  Physiology of Synapses,
  Berlin:Springer-Verlag, 1963
20. Grundfest, H.,
  “Excitation Triggers in Post-Junctional Cells,”
  “Physiological Triggers,”
  T. H. Bullock, ed., Washington, D.C.:American Physiological Society, 1955
21. Rall, W.,
  “Membrane Potential Transients and Membrane Time Constants of Motoneurons,”
  Exp. Neurol. 2:503-532 (1960)
22. Araki, T., and Otani, T.,
  “The Response of Single Motoneurones to Direct Stimulation,”
  J. Neurophysiol. 18:472-485 (1955)
23. Moore, G. P., Perkel, D. H., and Segundo, J. P.,
  “Stability Patterns in Interneuronal Pacemaker Regulation,”
  Proceedings of the San Diego Symposium for Biomedical Engineering,
  San Diego, California, 1963
24. Eccles, J. C.,
  The Neurophysiological Basis of Mind,
  Oxford:Clarendon Press, 1952

[Pg 19]

Fields and Waves in Excitable
Cellular Structures

R. M. STEWART

Space General Corporation
El Monte, California

“Study of living processes by the physiological method only proceeded laboriously behind the study of non-living systems. Knowledge about respiration, for instance, began to become well organized as the study of combustion proceeded, since this is an analogous operation....”

J. Z. Young (24)

INTRODUCTION

The study of electrical fields in densely-packed cellular media is prompted primarily by a desire to understand more fully the details of brain mechanism and its relation to behavior. Our work has specifically been directed toward an attempt to model such structures and mechanisms, using relatively simple inorganic materials.

The prototype for such experiments is the “Lillie[1] iron-wire nerve model.” Over a hundred years ago, it had been observed that visible waves were produced on the surface of a piece of iron submerged in nitric acid when and where the iron is touched by a piece of zinc. After a short period of apparent fatigue, the wire recovers and can again support a wave when stimulated. Major support for the idea that such impulses are in fact directly related to peripheral nerve impulses came from Lillie around 1920. Along an entirely different line, various persons have noted the morphological and dynamic similarity of dendrites in brain and those which sometimes grow by electrodeposition of metals from solution. Gordon Pask (17), especially, has pointed to this similarity and has discussed in a general way the concomitant possibility of a physical model for the persistent memory trace.

By combining and extending such concepts and techniques, we hope to produce a macroscopic model of “gray matter,” the structural matrix of which will consist of a dense, homogeneously-mixed, conglomerate of [Pg 20] small pellets, capable of supporting internal waves of excitation, of changing electrical behavior through internal fine-structure growth, and of forming temporal associations in response to peripheral shocks.

A few experimenters have subsequently pursued the iron-wire nerve-impulse analogy further, hoping thereby to illuminate the mechanisms of nerve excitation, impulse transmission and recovery, but interest has generally been quite low. It has remained fairly undisturbed in the text books and lecture demonstrations of medical students, as a picturesque aid to their formal education. On the outer fringes of biology, still less interest has been displayed; the philosophical vitalists would surely be revolted by the idea of such models of mind and memory, and at the other end of the scale, contemporary computer engineers generally assume that a nerve cell operates much too slowly to be of any value. This lack of interest is certainly due, in part, to success in developing techniques of monitoring individual nerve fibers directly to the point that it is just about as easy to work with large nerve fibers (and even peripheral and spinal junctions) as it is to work with iron wires. Under such circumstances, the model has only limited value, perhaps just to the extent that it emphasizes the role of factors other than specific molecular structure and local chemical reactions in the dynamics of nerve action.

When we leave the questions of impulse transmission on long fibers and peripheral junctions, however, and attempt to discuss the brain, there can be hardly any doubt that the development of a meaningful physical model technique would be of great value. Brain tissue is soft and sensitive, the cellular structures are small, tangled, and incredibly numerous. Therefore (Young (24)), “ ... physiologists hope that after having learned a lot about nerve-impulses in the nerves they will be able to go on to study how these impulses interact when they reach the brain. [But], we must not assume that we shall understand the brain only in the terms we have learned to use for the nerves. The function of nerves is to carry impulses—like telegraph wires. The functions of brains is something else.” But, confronted with such awesome experimental difficulties, with no comprehensive mathematical theory in sight, we are largely limited otherwise to verbal discourses, rationales and theorizing, a hopelessly clumsy tool for the development of an adequate understanding of brain function. A little over ten years ago Sperry (19) said, “Present day science is quite at a loss even to begin to describe the neural events involved in the simplest form of mental activity.” This situation has not changed much today. The development, study, and understanding of complex high-density cellular [Pg 21] structures which incorporate characteristics of both the Lillie and Pask models may, it is hoped, alleviate this situation. There would also be fairly obvious technological applications for such techniques if highly developed and which, more than any other consideration, has prompted support for this work.

Experiments to date have been devised which demonstrate the following basic physical functional characteristics:

(1) Control of bulk resistivity of electrolytes containing closely-packed, poorly-conducting pellets

(2) Circulation of regenerative waves on closed loops

(3) Strong coupling between isolated excitable sites

(4) Logically-complete wave interactions, including facilitation and annihilation

(5) Dendrite growth by electrodeposition in “closed” excitable systems

(6) Subthreshold distributed field effects, especially in locally-refractory regions.

In addition, our attention has necessarily been directed to various problems of general experimental technique and choice of materials, especially as related to stability, fast recovery and long life. However, in order to understand the possible significance of, and motivation for such experiments, some related modern concepts of neurophysiology, histology and psychology will be reviewed very briefly. These concepts are, respectively:

SOME CONTEMPORARY CONCEPTS

Since we are attempting to duplicate processes other than chemical, per se, we will forego any reference to the extensive literature of neurochemistry. It should not be surprising though if, at the neglect of the fundamental biological processes of growth, reproduction and metabolism, it proves possible to imitate some learning mechanisms with [Pg 22] grossly less complex molecular structures. There is also much talk of chemical versus electrical theories and mechanisms in neurophysiology. The distinction, when it can be made, seems to hinge on the question of the scale of size of significant interactions. Thus, “chemical” interactions presumably take place at molecular distances, possibly as a result of or subsequent to a certain amount of thermal diffusion. “Electrical” interactions, on the other hand, are generally understood to imply longer range or larger scale macroscopic fields.

1. Cellular Structure

The human brain contains approximately 10¹⁰ neurons to which the neuron theory assigns the primary role in central nervous activity. These cells occupy, however, a relatively small fraction of the total volume. There are, for example, approximately 10 times that number of neuroglia, cells of relatively indeterminate function. Each neuron (consisting of cell body, dendrites and, sometimes, an axon) comes into close contact with the dendrites of other neurones at some thousands of places, these synapses and “ephapses” being spaced approximately 5μ apart (1). The total number of such apparent junctions is therefore of the order of 10¹³. In spite of infinite fine-structure variations when viewed with slightly blurred vision, the cellular structure of the brain is remarkably homogeneous. In the cortex, at least, the extensions of most cells are relatively short, and when the cortex is at rest, it appears from the large EEG alpha-rhythms that large numbers of cells beat together in unison. Quoting again from Sperry, “In short, current brain theory encourages us to try to correlate our subjective psychic experience with the activity of relatively homogeneous nerve cell units conducting essentially homogeneous impulses, through roughly homogeneous cerebral tissue.”

2. Short-Term Memory

A train of impulses simply travelling on a long fiber may, for example, be regarded as a short-term memory much in the same way as a delay line acts as a transient memory in a computer. A similar but slightly longer term memory may also be thought of to exist in the form of waves circulating in closed loops (23). In fact, it is almost universally held today that most significant memory occurs in two basic interrelated ways. First of all, such a short-term circulating, reverberatory or regenerative memory which, however, could not [Pg 23] conceivably persist through such things as coma, anesthesia, concussion, extreme cold, deep sleep and convulsive seizures and thus, secondly, a long-term memory trace which must somehow reside in a semipermanent fine-structural change. As Hebb (9) stated, “A reverbratory trace might cooperate with a structural change and carry the memory until the growth change is made.”

3. The Synapse

The current most highly regarded specific conception of the synapse is largely due to and has been best described by Eccles (5): “ ... the synaptic connections between nerve cells are the only functional connections of any significance. These synapses are of two types, excitatory and inhibitory, the former type tending to make nerve cells discharge impulses, the other to suppress the discharge. There is now convincing evidence that in vertebrate synapses each type operates through specific chemical transmitter substances ...”. In response to a presentation by Hebb (10), Eccles was quoted as saying, “One final point, and that is if there is electrical interaction, and we have seen from Dr. Estable’s work the complexity of connections, and we now know from the electronmicroscopists that there is no free space, only 200 Å clefts, everywhere in the central nervous system, then everything should be electrically interacted with everything else. I think this is only electrical background noise and, that when we lift with specific chemical connections above that noise we get a significant operational system. I would say that there is electrical interaction but it is just a noise, a nuisance.” Eccles’ conclusions are primarily based on data obtained in the peripheral nervous system and the spinal cord. But there is overwhelming reason to expect that cellular interactions in the brain are an entirely different affair. For example, “The highest centres in the octopus, as in vertebrates and arthropods, contain many small neurons. This finding is such a commonplace, that we have perhaps failed in the past to make the fullest inquiry into its implications. Many of these small cells possess numerous processes, but no axon. It is difficult to see, therefore, that their function can be conductive in the ordinary sense. Most of our ideas about nervous functioning are based on the assumption that each neuron acts essentially as a link in some chain of conduction, but there is really no warrant for this in the case of cells with many short branches. Until we know more of the relations of these processes to each other in the neuropile it would be unwise to say more. It is possible that the effective part of the [Pg 24] discharge of such cells is not as it is in conduction in long pathways, the internal circuit that returns through the same fiber, but the external circuit that enters other processes, ...” (3).

4. Inhibition

The inhibitory chemical transmitter substance postulated by Eccles has never been detected in spite of numerous efforts to do so. The mechanism(s) of inhibition is perhaps the key to the question of cellular interaction and, in one form or another, must be accounted for in any adequate theory.

Other rather specific forms of excitation and inhibition interaction have been proposed at one time or another. Perhaps the best example is the polar neuron of Gesell (8) and, more recently, Retzlaff (18). In such a concept, excitatory and inhibitory couplings differ basically because of a macroscopic structural difference at the cellular level; that is, various arrangements or orientation of intimate cellular structures give rise to either excitation or inhibition.

5. Long-Term Memory

Most modern theories of semipermanent structural change (or engrams, as they are sometimes called) look either to the molecular level or to the cellular level. Various specific locales for the engram have been suggested, including (1) modifications of RNA molecular structure, (2) changes of cell size, synapse area or dendrite extensions, (3) neuropile modification, and (4) local changes in the cell membrane. There is, in fact, rather direct evidence of the growth of neurons or their dendrites with use and the diminution or atrophy of dendrites with disuse. The apical dendrite of pyramidal neurones becomes thicker and more twisted with continuing activity, nerve fibers swell when active, sprout additional branches (at least in the spinal cord) and presumably increase the size and number of their terminal knobs. As pointed out by Konorski (11), the morphological conception of plasticity according to which plastic changes would be related to the formation and multiplication of new synaptic junctions goes back at least as far as Ramon y Cajal in 1904. Whatever the substrate of the memory trace, it is, at least in adults, remarkably immune to extensive brain damage and as Young (24) has said: “ ... this question of the nature of the memory trace is one of the most obscure and disputed in the whole of biology.” [Pg 25]

6. Field Effects and Learning

First, from Boycott and Young (3), “The current conception, on which most discussions of learning still concentrate, is that the nervous system consists essentially of an aggregate of chains of conductors, linked at key points by synapses. This reflex conception, springing probably from Cartesian theory and method, has no doubt proved of outstanding value in helping us to analyse the actions of the spinal cord, but it can be argued that it has actually obstructed the development of understanding of cerebral function.”

Most observable evidence of learning and memory is extremely complex and its interpretation full of traps. Learning in its broadest sense might be detected as a semipermanent change of behavior pattern brought about as a result of experience. Within that kind of definition, we can surely identify several distinctly different types of learning, presumably with distinctly different kinds of mechanisms associated with each one. But, if we are to stick by our definition of a condition of semipermanent change of behavior as a criterion for learning, then we may also be misled into considering the development of a neurosis, for example, as learning, or even a deep coma as learning.

When we come to consider field effects, current theories tend to get fairly obscure, but there seems to be an almost universal recognition of the fact that such fields are significant. For example, Morrell (16) says in his review of electrophysiological contributions to the neural basis of learning, “A growing body of knowledge (see reviews by Purpura, Grundfest, and Bishop) suggests that the most significant integrative work of the central nervous system is carried on in graded response elements—elements in which the degree of reaction depends upon stimulus intensity and is not all-or-none, which have no refractory period and in which continuously varying potential changes of either sign occur and mix and algebraically sum.” Gerard (7) also makes a number of general comments along these lines. “These attributes of a given cell are, in turn, normally controlled by impulses arising from other regions, by fields surrounding them—both electric and chemical—electric and chemical fields can strongly influence the interaction of neurones. This has been amply expounded in the case of the electric fields.”

Learning situations involving “punishment” and “reward” or, subjectively, “pain” and “pleasure” may very likely be associated with transient but structurally widespread field effects. States of distress [Pg 26] and of success seem to exert a lasting influence on behavior only in relation to simultaneous sensory events or, better yet, sensory events just immediately preceding in time. For example, the “anticipatory” nature of a conditioned reflex has been widely noted (21). From a structural point of view, it is as if recently active sites regardless of location or function were especially sensitive to extensive fields. There is a known inherent electrical property of both nerve membrane and passive iron surface that could hold the answer to this mechanism of spatially-diffuse temporal association; namely, the surface resistance drops to less than 1 per cent of its resting value during the refractory period which immediately follows activation.

EXPERIMENTAL TECHNIQUE

In almost all experiments, the basic signal-energy mechanism employed has been essentially that one studied most extensively by Lillie (12), Bonhoeffer (2), Yamagiwa (22), Matumoto and Goto (14) and others, i.e., activation, impulse propagation and recovery on the normally passive surface of a piece of iron immersed in nitric acid or of cobalt in chromic acid (20). The iron we have used most frequently is of about 99.99% purity, which gives performance more consistent than but similar to that obtained using cleaned “coat-hanger” wires. The acid used most frequently by us is about 53-55% aqueous solution by weight, substantially more dilute than that predominantly used by previous investigators. The most frequently reported concentration has been 68-70%, a solution which is quite stable and, hence, much easier to work with in open containers than the weaker solutions, results in very fast waves but gives, at room temperatures, a very long refractory period (typically, 15 minutes). A noble metal (such as silver, gold or platinum) placed in contact with the surface of the iron has a stabilizing effect (14) presumably through the action of local currents and provides a simple and useful technique whereby, with dilution, both stability and fast recovery (1 second) can be achieved in simple demonstrations and experiments.

Experiments involving the growth by electrodeposition and study of metallic dendrites are done with an eye toward electrical, physical and chemical compatibility with the energy-producing system outlined above. Best results to date (from the standpoints of stability, non-reactivity, and morphological similarity to neurological structures) have been obtained by dissolving various amounts of gold chloride salt in 53-55% HNO₃. [Pg 27]

An apparatus has been devised and assembled for the purpose of containing and controlling our primary experiments. (See Figure 1). Its two major components are a test chamber (on the left in Figure 1) and a fluid exchanger (on the right). In normal operation the test chamber, which is very rigid and well sealed after placing the experimental assembly inside, is completely filled with electrolyte (or, initially, an inert fluid) to the exclusion of all air pockets and bubbles. Thus encapsulated, it is possible to perform experiments which would otherwise be impossible due to instability. The instability which plagues such experiments is manifested in copious generation of bubbles on and subsequent rapid disintegration of all “excitable” material (i.e., iron). Preliminary experiments indicated that such “bubble instability” could be suppressed by constraining the volume available to expansion. In particular, response and recovery times can now be decreased substantially and work can proceed with complex systems of interest such as aggregates containing many small iron pellets.

The test chamber is provided with a heater (and thermostatic control) which makes possible electrochemical impulse response and recovery times comparable to those of the nervous system (1 to 10 msec). The fluid-exchanger is so arranged that fluid in the test chamber can be arbitrarily changed or renewed by exchange within a rigid, sealed, completely liquid-filled (“isochoric”) loop. Thus, stability can be maintained for long periods of time and over a wide variety of investigative or operating conditions.

Most of the parts of this apparatus are made of stainless steel and are sealed with polyethylene and teflon. There is a small quartz observation window on the test chamber, two small lighting ports, a pressure transducer, thermocouple, screw-and-piston pressure actuator and umbilical connector for experimental electrical inputs and outputs.

BASIC EXPERIMENTS

The basic types of experiments described in the following sections are numbered for comparison to correspond roughly to related neurophysiological concepts summarized in the previous section.

1. Cellular Structure

The primary object of our research is the control and determination of dynamic behavior in response to electrical stimulation in close-packed aggregates of small pellets submerged in electrolyte. Typically, the aggregate contains (among other things) iron and the electrolyte contains nitric acid, this combination making possible the propagation of electrochemical surface waves of excitation through the body of the aggregate similar to those of the Lillie iron-wire nerve model. The iron pellets are imbedded in and supported by a matrix of small dielectric (such as glass) pellets. Furthermore, with the addition of soluble salts of various noble metals to the electrolyte, long interstitial dendritic or fibrous structures of the second metal can be formed whose length and distribution change by electrodeposition in response to either internal or externally generated fields. [Pg 28]

Figure 1—Test chamber and
fluid exchanger

[Pg 29] Coupling between isolated excitable (iron) sites is greatly affected by the fine structure and effective bulk resistivity of the glass and fluid medium which supports and fills the space between such sites. In general (see Section 3, following) it is necessary, to promote strong coupling between small structures, to impede the “short-circuit” return flow of current from an active or excited surface, through the electrolyte and back through the dendritic structure attached to the same excitable site. This calls for control (increase) of the bulk resistivity, preferably by means specifically independent of electrolyte composition, which relates to and affects surface phenomena such as recovery (i.e., the “refractory” period). Figure 2 illustrates the way in which this is being done, i.e., by appropriate choice of particle size distributions. The case illustrated shows the approximate proper volume ratios for maximum resistivity in a two-size-phase random mixture of spheres.

2. Regenerative Loops

Figure 3 shows an iron loop (about 2-inch diameter) wrapped with a silver wire helix which is quite stable in 53-55% acid and which will easily support a circulating pattern of three impulses. For demonstration, unilateral waves can be generated by first touching the iron with a piece of zinc (which produces two oppositely travelling waves) and then blocking one of them with a piece of platinum or a small platinum screen attached to the end of a stick or wand. Carbon blocks may also be used for this purpose.

The smallest regenerative or reverberatory loop which we are at present able to devise is about 1 mm in diameter. Multiple waves, as expected, produce stable patterns in which all impulses are equally spaced. This phenomenon can be related to the slightly slower speed characteristic of the relative refractory period as compared with a more fully recovered zone. [Pg 30]

Figure 2—Conductivity control—mixed pellet-size aggregates

[Pg 31]

Figure 3—Regenerative or reverberatory loop

3. Strong Coupling

If two touching pieces of iron are placed in a bath of nitric acid, a wave generated on one will ordinarily spread to the other. As is to be expected, a similar result is obtained if the two pieces are connected through an external conducting wire. However, if they are isolated, strong coupling does not ordinarily occur, especially if the elements are small in comparison with a “critical size,” σ/ρ where σ is the surface resistivity of passive iron surface (in Ω-cm²) and ρ is the volume resistivity of the acid (in Ω-cm). A simple and informative structure which demonstrates the essential conditions for strong electrical coupling between isolated elements of very small size may be constructed as shown in Figure 4. The dielectric barrier insures that charge transfer through one dipole must be accompanied by an equal and opposite transfer through the surfaces of the other dipole. If the “inexcitable” silver tails have sufficiently high conductance (i.e., sufficiently large surface area, hence preferably, dendrites), strong coupling will occur, just as though the cores of the two pieces of iron were connected with a solid conducting wire. [Pg 32]

Figure 4

Figure 5—Electrochemical excitatory-inhibitory
interaction cell

[Pg 33]

4. Inhibitory Coupling

If a third “dipole” is inserted through the dielectric membrane in the opposite direction, then excitation of this isolated element tends to inhibit the response which would otherwise be elicited by excitation of one of the parallel dipoles. Figure 5 shows the first such “logically-complete” interaction cell successfully constructed and demonstrated. It may be said to behave as an elementary McCulloch-Pitts neuron (15). Further analysis shows that similar structures incorporating many dipoles (both excitatory and inhibitory) can be made to behave as general “linear decision functions” in which all input weights are approximately proportional to the total size or length of their corresponding attached dendritic structures.

5. Dendrite Growth

Figure 6 shows a sample gold dendrite grown by electrodeposition (actual size, about 1 mm) from a 54% nitric acid solution to which gold chloride was added. When such a dendrite is attached to a piece of iron (both submerged), activation of the excitable element produces a field in such a direction as to promote further growth of the dendritic structure. Thus, if gold chloride is added to the solution used in the elementary interaction cells described above, all input influence “weights” tend to increase with use and, hence, produce a plasticity of function.

6. Field Effects in Locally-Refractory Regions

Our measurements indicate that, during the refractory period following excitation, the surface resistance of iron in nitric acid drops to substantially less than 1% of its resting value in a manner reminiscent of nerve membranes (4). Thus, if a distributed or gross field exists at any time throughout a complex cellular aggregate, concomitant current densities in locally-refractive regions will be substantially higher than elsewhere and, if conditions appropriate to dendrite growth exist (as described above) growth rates in such regions will also be substantially higher than elsewhere. It would appear that, as a result, recently active functional couplings (in contrast to those not associated with recent neural activity) should be significantly altered by widely distributed fields or massive peripheral shocks. This mechanism might thus explain the apparent ability of the brain to form specific temporal associations in response to spatially-diffuse effects such as are generated, for example, by the pain receptors. [Pg 34]

(a)

(b)

Figure 6—Dendritic structures, living and non-living. (a) Cat dendrite trees (from Bok, “Histonomy of the Cerebral Cortex,” Elsevier, 1959); (b) Electrodeposited gold dendrite tree.

[Pg 35]

SUMMARY

An attempt is being made to develop meaningful electrochemical model techniques which may contribute toward a clearer understanding of cortical function. Two basic phenomena are simultaneously employed which are variants of (1) the Lillie iron-wire nerve model, and (2) growth of metallic dendrites by electrodeposition. These phenomena are being induced particularly within dense cellular aggregates of various materials whose interstitial spaces are flooded with liquid electrolyte.

REFERENCES

1. Bok, S. T.,
  “Histonomy of the Cerebral Cortex,”
  Amsterdam, London:Elsevier Publishing Co., New York:Princeton, 1959
2. Bonhoeffer, K. F.,
  “Activation of Passive Iron as a Model for the Excitation of Nerve,”
  J. Gen. Physiol. 32:69-91 (1948).
 

This paper summarizes work carried out during 1941-1946 at the University of Leipzig, and published during the war years in German periodicals.

3. Boycott, B. B., and Young, J. Z.,
  “The Comparative Study of Learning,”
  S. E. B. Symposia, No. IV
  “Physiological Mechanisms in Animal Behavior,”
  Cambridge: University Press, USA:Academic Press, Inc., 1950
4. Cole, K. S., and Curtis, H. J.,
  “Electric Impedance of the Squid Giant Axon During Activity,”
  J. Gen. Physiol. 22:649-670 (1939)
5. Eccles, J. C.,
  “The Effects of Use and Disuse of Synaptic Function,”
  “Brain Mechanisms and Learning—A Symposium,”
 

organized by the Council for International Organizations of Medical Science, Oxford:Blackwell Scientific Publications, 1961

6. Franck, U. F.,
  “Models for Biological Excitation Processes,”
  “Progress in Biophysics and Biophysical Chemistry,”
  J. A. V. Butler, ed., London and New York:Pergamon Press, pp. 171-206, 1956
7. Gerard, R. W.,
  “Biological Roots of Psychiatry,”
  Science 122 (No. 3162):225-230 (1955)
8. Gesell, R.,
  “A Neurophysiological Interpretation of the Respiratory Act,”
  Ergedn. Physiol. 43:477-639 (1940)
9. Hebb, D. O.,
  “The Organization of Behavior, A Neuropsychological Theory,”
  New York:John Wiley and Sons, 1949
10. Hebb, D. O.,
  “Distinctive Features of Learning in the Higher Animal,”
  “Brain Mechanisms and Learning—A Symposium,”
 

organized by the Council for International Organizations of Medical Science, Oxford:Blackwell Scientific Publications, 1961

11. Konorski, J.,
  “Conditioned Reflexes and Neuron Organization,”
[Pg 36] Cambridge:Cambridge University Press, 1948
12. Lillie, R. S.,
  “Factors Affecting the Transmission and Recovery in the Passive Iron Nerve Model,”
  J. Gen. Physiol. 4:473 (1925)
13. Lillie, R. S.,
  Biol. Rev. 16:216 (1936)
14. Matumoto, M., and Goto, K.,
  “A New Type of Nerve Conduction Model,”
  The Gurma Journal of Medical Sciences 4(No. 1) (1955)
15. McCulloch, W. S., and Pitts, W.,
  “A Logical Calculus of the Ideas Immanent in Nervous Activity,”
  Bulletin of Mathematical Biophysics 5:115-133 (1943)
16. Morrell, F.,
  “Electrophysiological Contributions to the Neural Basis of Learning,”
  Physiological Reviews 41(No. 3) (1961)
17. Pask, G.,
  “The Growth Process Inside the Cybernetic Machine,”
  Proc. 2nd Congress International Association Cybernetics, Gauthier-Villars, Paris:Namur, 1958
18. Retzlaff, E.,
  “Neurohistological Basis for the Functioning of Paired Half-Centers,”
  J. Comp. Neurology 101:407-443 (1954)
19. Sperry, R. W.,
  “Neurology and the Mind-Brain Problem,”
  Amer. Scientist 40(No. 2): 291-312 (1952)
20. Tasaki, I., and Bak, A. F.,
  J. Gen. Physiol. 42:899 (1959)
21. Thorpe, W. H.,
  “The Concepts of Learning and Their Relation to Those of Instinct,”
  S. E. B. Symposia, No. IV,
  “Physiological Mechanisms in Animal Behavior,”
  Cambridge:University Press, USA:Academic Press, Inc., 1950
22. Yamagiwa, K.,
  “The Interaction in Various Manifestations (Observations on Lillie’s Nerve Model),”
  Jap. J. Physiol. 1:40-54 (1950)
23. Young, J. Z.,
  “The Evolution of the Nervous System and of the Relationship of Organism and Environment,”
  G. R. de Beer, ed.,
  “Evolution,”
  Oxford:Clarendon Press, pp. 179-204, 1938
24. Young, J. Z.,
  “Doubt and Certainty in Science, A Biologist’s Reflections on the Brain,”
  New York:Oxford Press, 1951

[Pg 37]

Multi-Layer Learning Networks

R. A. STAFFORD

Philco Corp., Aeronutronic Division
Newport Beach, California

INTRODUCTION

This paper is concerned with the problem of designing a network of linear threshold elements capable of efficiently adapting its various sets of weights so as to produce a prescribed input-output relation. It is to accomplish this adaptation by being repetitively presented with the various inputs along with the corresponding desired outputs. We will not be concerned here with the further requirement of various kinds of ability to “generalize”—i.e., to tend to give correct outputs for inputs that have not previously occurred when they are similar in some transformed sense to other inputs that have occurred.

In putting forth a model for such an adapting or “learning” network, a requirement is laid down that the complexity of the adaption process in terms of interconnections among elements needed for producing appropriate weight changes, should not greatly exceed that already required to produce outputs from inputs with a static set of weights. In fact, it has been found possible to use the output-from-input computing capacity of the network to help choose proper weight changes by observing the effect on the output of a variety of possible weight changes.

No attempt is made here to defend the proposed network model on theoretical grounds since no effective theory is known at present. Instead, the plausibility of the various aspects of the network model, combined with empirical results must suffice.

SINGLE ELEMENTS

To simplify the problem it is assumed that the network receives a set of two-valued inputs, x₁, x₂, ..., xₙ, and is required to produce only a single two-valued output, y. It is convenient to assign the numerical quantities +1 and -1 to the two values of each variable.

The simplest network would consist of a single linear threshold element with a set of weights, c₀, c₁, c₂, ..., cₙ. These determine the [Pg 38] output-input relation or function so that y is +1 or -1 according as the quantity, c₀ + c₁x₁ + c₂x₂ + ... + cₙxₙ, is positive or not, respectively. It is possible for such a single element to exhibit an adaptive behavior as follows. If, for a given set, x₁, x₂, ..., xₙ, the output, y, is correct, then make no changes to the weights. Otherwise change the weights according to the equations

It has been shown by a number of people that the weights of such an element are assured of arriving at a set of values which produce the correct output-input relation after a sufficient number of errors, provided that such a set exists. An upper bound on the number of possible errors can be given which depends only on the initial weight values and the logical function to be learned. This does not, however, solve our network problem for two reasons.

First, as the number, n, of inputs gets large, the number of errors to be expected for most functions which can be learned increases to unreasonable values. For example, for n = 6, most such functions result in 500 to 1000 errors compared to an average of 32 errors to be expected in a perfect learning device.

Second, and more important, the fraction of those logical functions which can be generated in a single element becomes vanishingly small as n increases. For example, at n = 6 less than one in each three trillion logical functions is so obtainable.

NETWORKS OF ELEMENTS

It can be demonstrated that if a sufficiently large number of linear threshold elements is used, with the outputs of some being the inputs of others, then a final output can be produced which is any desired logical function of the inputs. The difficulty in such a network lies in the fact that we are no longer provided with a knowledge of the correct output for each element, but only for the final output. If the final output is incorrect there is no obvious way to determine which sets of weights should be altered.

As a result of considerable study and experimentation at Aeronutronic, a network model has been evolved which, it is felt, will get around these difficulties. It consists of four basic features which will now be described. [Pg 39]

Positive Interconnecting Weights

It is proposed that all weights in elements attached to inputs which come from other elements in the network be restricted to positive values. (Weights attached to the original inputs to the network, of course, must be allowed to be of either sign.) The reason for such a restriction is this. If element 1 is an input to element 2 with weight c₁₂, element 2 to element 3 with weight c₂₃, etc., then the sign of the product, c₁₂c₂₃ ..., gives the sense of the effect of a change in the output of element 1 on the final element in the chain (assuming this is the only such chain between the two elements). If these various weights were of either possible sign, then a decision as to whether or not to change the output in element 1 to help correct an error in the final element would involve all weights in the chain. Moreover, since there would in general be a multiplicity of such chains, the decision is rendered impossibly difficult.

The above restriction removes this difficulty. If the output of any element in the network is changed, say, from -1 to +1, the effect on the final element, if it is affected at all, is in the same direction.

It should be noted that this restriction does not seriously affect the logical capabilities of a network. In fact, if a certain logical function can be achieved in a network with the use of weights of unrestricted sign, then the same function can be generated in another network with only positive interconnecting weights and, at worst, twice the number of elements. In the worst case this is done by generating in the restricted network both the output and its complement for each element of the unrestricted network. (It is assumed that there are no loops in the network.)

A Variable Bias

The central problem in network learning is that of determining, for a given input, the set of elements whose outputs can be altered so as to correct the final element, and which will do the least amount of damage to previous adaptations to other inputs. Once this set has been determined, the incrementing rule given for a single element will apply in this case as well (subject to the restriction of leaving interconnecting weights positive), since the desired final output coincides with that desired for each of the elements to be changed (because of positive interconnecting weights).

In the process of arriving at such a decision three factors need to be considered. Elements selected for change should tend to be those whose [Pg 40] output would thereby be affected for a minimum number of other possible inputs. At the same time it should be ascertained that a change in each of the elements in question does indeed contribute significantly towards correcting the final output. Finally, a minimum number of such elements should be used.

It would appear at first that this kind of decision is impossible to achieve if the complexity of the decision apparatus is kept comparable to that of the basic input-output network as mentioned earlier. However, in the method to be described it is felt that a reasonable approximation to these requirements will be achieved without an undue increase in complexity.

It is assumed that in addition to its normal inputs, each element receives a variable input bias which we can call b. The output of every element should then be determined by the sign of the usual weighted sum of its inputs plus this bias quantity. This bias is to be the same for each element of the network. If b = 0 the network will behave as before. However, if b is increased gradually, various elements throughout the network will commence changing from -1 to +1, with one or a few changing at any one time as a rule. If b is decreased, the opposite will occur.

Now suppose that for a given input the final output ought to be +1 but actually is -1. Assume that b is then raised so high that this final output is corrected. Then commence a gradual decline in b. Various elements may revert to -1, but until the final output does, no weights are changed. When the final output does revert to -1, it is due to an element’s having a sum (weighted sum plus bias) which just passed down through zero. This then caused a chain effect of changing elements up to the final element, but presumably this element is the only one possessing a zero sum. This can then be the signal for the weights on an element to change—a change of final output from right to wrong accompanied simultaneously by a zero sum in the element itself.

After such a weight change, the final output will be correct once more and the bias can again proceed to fall. Before it reaches zero, this process may occur a number of times throughout the network. When the bias finally stands at zero with the final output correct, the network is ready for the next input. Of course if -1 is desired, the bias will change in the opposite direction.

It is possible that extending the weight change process a little past the zero bias level may have beneficial results. This might increase the life expectancy of each learned input-output combination and thereby reduce the total number of errors. This is because the method [Pg 41] used above can stop the weight correction process so that even though the final output is correct, some elements whose output are essential to the final output have sums close to zero, which are easily changed by subsequent weight changes.

It will be noted that this method conforms to all three considerations mentioned previously. First, by furnishing each element the same bias, and by not changing weights until the final output becomes incorrect with dropping bias, there is a strong tendency to select elements which, with b = 0, would have sums close to zero. But the size of the sum in an element is a good measure of the amount of damage done to an element for other inputs if its current output is to be changed. Second, it is obvious that each element changed has had a demonstrable effect on the final output. Finally, there will be a clear tendency to change only a minimum of elements because changes never occur until the output clearly requires a change.

On the other hand this method requires little more added complexity to the network than it already has. Each element requires a bias, an error signal, and the desired final output, these things being uniform for all elements in a network. Some external device must manipulate the bias properly, but this is a simple behavior depending only on an error signal and the desired final output—not on the state of individual elements in the network. What one has, then, is a network consisting of elements which are nearly autonomous as regards their decisions to change weights. Such a scheme appears to be the only way to avoid constructing a central weight-change decision apparatus of great complexity. This rather sophisticated decision is made possible by utilizing the computational capabilities the network already possesses in producing outputs from inputs.

It should be noted here that this varying bias method requires that the variable bias be furnished to just those elements which have variable weights and to no others. Any fixed portion of the network, such as preliminary layers or final majority function for example, must operate independently of the variable bias. Otherwise, the final output may go from right to wrong as the bias moves towards zero and no variable-weight element be to blame. In such a case the network would be hung up.

Logical Redundancy in the Network

A third aspect of the network model is that for all the care taken in the previous steps, they will not suffice in settling quickly to a set [Pg 42] of weights that will generate the required logical function unless there is a great multiplicity of ways in which this can be done. This is to say that a learning network needs to have an excess margin of weights and elements beyond the minimum required to generate the functions which are to be learned.

This is analogous to the situation that prevails for a single element as regards the allowed range of values on its weights. It can be shown for example, that any function for n=6 that can be generated by a single element can be obtained with each weight restricted to the range of integer values -9,-8, ..., +9. Yet no modification of the stated weight change rule is known which restricts weight values to these and yet has any chance of ever being learned for most functions.

Fatigued Elements

It would appear from some of the preliminary results of network simulations that it may be useful to have elements become “fatigued” after undergoing an excessive number of weight changes. Experiments have been performed on simplifications of the model described so far which had the occasional result that a small number of elements came to a state where they received most of the weight increments, much to the detriment of the learning process. In such cases the network behaves as if it were composed of many fewer adjustable elements. In a sense this is asking each element to maintain a record of the data it is being asked to store so that it does not attempt to exceed its own information capacity.

It is not certain just how this fatigue factor should enter in the element’s actions, but if it is to be compatible with the variable bias method, this fatigue factor must enter into the element’s response to a changing bias. Once an element changes state with zero sum at the same time that the final output becomes wrong, incrementing must occur if the method is to work. Hence a “fatigued” element must respond less energetically to a change of bias, perhaps with a kind of variable factor to be multiplied by the bias term.

NETWORK STRUCTURE

It is felt that the problem of selecting the structure of interconnections for a network is intimately connected to the previously mentioned problem of generalization. Presumably a given type of generalization can be obtained by providing appropriate fixed [Pg 43] portions of the network and an appropriate interconnection structure for the variable portion. However, for very large networks, it is undoubtedly necessary to restrict the complexity so that it can be specified by relatively simple rules. Since very little is known about this quite important problem, no further discussion will be attempted here.

COMPUTER SIMULATION RESULTS

A computer simulation of some of the network features previously described has been made on an IBM 7090. Networks with an excess of elements and with only positive interconnecting weights were used. However, in place of the variable bias method, a simple choice of the element of sum closest to, and on the wrong side of, zero was made without regard to the effectiveness of the element in correcting the final output. No fatigue factors were used.

The results of these simulations are very encouraging, but at the same time indicate the need for the more sophisticated methods. No attempt will be made here to describe the results completely.

In one series of learning experiments, a 22-element network was used which had three layers, 10 elements on the first, 11 on the second, and 1 on the third. The single element on the third was the final output, and was a fixed majority function of the 11 elements in the second layer. These in turn each received inputs from each of the 10 on the first layer and from each of the 6 basic inputs. The 10 on the first layer each received only the 6 basic inputs. A set of four logical functions, A, B, C, and D, was used. Function A was actually a linear threshold function which could be generated by the weights 8, 7, 6, 5, 4, 3, 2, functions B and C were chosen by randomly filling in a truth table, while D was the parity function.

TABLE I
 
A B C D
 r   r  e  r  e  r  e
5 54  8 100  11 101  4 52 
4 37  9 85  4 60  5 62 
4 44  6 72  9 85  6 56 
 

[Pg 44] Table I gives the results of one series of runs with these functions and this network, starting with various random initial weights. The quantity, r, is the number of complete passes through the 64-entry truth table before the function was completely learned, while e is the total number of errors made. In evaluating the results it should be noted that an ideal learning device would make an average of 32 errors altogether on each run. The totals recorded in these runs are agreeably close to this ideal. As expected, the linear threshold function is the easiest to learn, but it is surprising that the parity function was substantially easier than the two randomly chosen functions. Table II gives a chastening result of the same experiment with all interconnecting weights removed except that the final element is a fixed majority function of the other 21 elements. Thus there was adaptation on one layer only. As can be seen Table I is hardly better than Table II so that the value of variable interconnecting weights was not being fully realized. In a later experiment the number of elements was reduced to 12 elements and the same functions used. In this case the presence of extra interconnecting weights actually proved to be a hindrance! However a close examination of the incrementing process brought out the fact that the troublesome behavior was due to the greater chance of having only a few (often only one) elements do nearly all the incrementing. It is expected that the use of the additional refinements discussed herein will produce a considerable improvement in bringing out the full power of adaptation in multiple layers of a network.

TABLE II
 
A B C D
 r   r  e  r  e  r  e
7 47  18  192  8 110  4 48 
3 40  7 69  10  98  6 68 
4 43  7 82  4 47  6 46 
 

FUTURE PROBLEMS

Aside from the previous question of deciding on network structure, there are several other questions that remain to be studied in learning networks.

There is the question of requiring more than a single output from a network. If, say, two outputs are required for a given input, one +1 and the other -1, this runs into conflict with the incrementing process. Changes that aid one output may act against the other. [Pg 45] Apparently the searching process depicted before with a varying bias must be considerably refined to find weight changes which act on all the outputs in the required way. This is far from an academic question because there will undoubtedly be numerous cases in which the greatest part of the input-output computation will have shared features for all output variables. Only at later levels do they need to be differentiated. Hence it is necessary to envision a single network producing multiple outputs rather than a separate network for each output variable if full efficiency is to be achieved.

Another related question is that of using input variables that are either many-, or continuous-, valued rather than two-valued. No fundamental difficulties are discernible in this case, but the matter deserves some considerable study and experimentation.

Another important question involves the use of a succession of inputs for producing an output. That is, it may be useful to allow time to enter into the network’s logical action, thus giving it a “dynamic” as well as “static” capability.


[Pg 46]

Adaptive Detection of Unknown
Binary Waveforms

J. J. Spilker, Jr.

Philco Western Development Laboratories
Palo Alto, California

This work was supported by the Philco WDL Independent Development Program. This paper, submitted after the Symposium, represents a more detailed presentation of some of the issues raised in the discussion sessions at the Symposium and hence, constitutes a worthwhile addition to the Proceedings.

INTRODUCTION

One of the most important objectives in processing a stream of data is to determine and detect the presence of any invariant or quasi-invariant “features” in that data stream. These features are often initially unknown and must be “learned” from the observations. One of the simplest features of this form is a finite length signal which occurs repetitively, but not necessarily periodically with time, and has a waveshape that remains invariant or varies only slowly with time.

In this discussion, we assume that the data stream has been pre-processed, perhaps by a detector or discriminator, so as to exhibit this type of repetitive (but unknown) waveshape or signal structure. The observed signal, however, is perturbed by additive noise or other disturbances. It is desired to separate the quasi-invariance of the data from the truly random environment. The repetitive waveform may represent, for example, the transmission of an unknown sonar or radar, a pulse-position modulated noise-like waveform, or a repeated code word.

The problem of concern is to estimate the signal waveshape and to determine the time of each signal occurrence. We limit this discussion to the situation where only a single repetitive waveform is present and the signal sample values are binary. The observed waveform is assumed to be received at low signal-to-noise ratio so that a single observation of the signal (even if one knew precisely the arrival time) is not sufficient to provide a good estimate of the signal waveshape. The occurrence time of each signal is assumed to be random. [Pg 47]

THE ADAPTIVE DETECTION MACHINE

The purpose of this note is to describe very briefly a machine[2] which has been implemented to recover the noise-perturbed binary waveform. A simplified block diagram of the machine is shown in Figure 1. The experimental machine has been designed to operate on signals of 10³ samples duration.

Each analog input sample enters the machine at left and may either contain a signal sample plus noise or noise alone. In order to permit digital operation in the machine, the samples are quantized in a symmetrical three-level quantizer. The samples are then converted to vector form, e.g., the previous 10³ samples form the vector components. A new input vector, Y⁽ⁱ⁾, is formed at each sample instant.

Define the signal sample values as s₁, s₂, ..., sₙ. The observed vector Y⁽ⁱ⁾ is then either (a) perfectly centered signal plus noise, (b) shifted signal plus noise, or (c) noise alone.

  (s₁, s₂, ..., sₙ) + (n₁, n₂, ..., nₙ) (a)
(Y⁽ⁱ⁾)ᵗ =  (0, ..., s₁, s₂, ..., sₙ₋ⱼ) + (n₁, n₂, ..., nₙ)   (b)
  (0 ... 0) + (n₁, n₂, ..., nₙ) (c)

At each sample instant, two measurements are made on the input vector, an energy measurement ‖Y⁽ⁱ⁾‖² and a polarity coincidence cross-correlation with the present estimate of the signal vector stored in memory. If the weighted sum of the energy and cross-correlation measurements exceeds the present threshold value Γᵢ, the input vector is accepted as containing the signal (properly shifted in time), and the input vector is added to the memory. The adaptive memory has 2Q levels, 2Q-1 positive levels, 1 zero level and 2Q-1-1 negative levels. New contributions are made to the memory by normal vector addition except that saturation occurs when a component value is at the maximum or minimum level.

The acceptance or rejection of a given input vector is based on a hypersphere decision boundary. The input vector is accepted if the weighted sum γᵢ exceeds the threshold Γᵢ

γᵢ = Y⁽ⁱ⁾∙M⁽ⁱ⁾ + α‖Y⁽ⁱ⁾‖² ⩾ Γᵢ.

[Pg 48]

Figure 1—Block diagram of the adaptive binary waveform detector

[Pg 49] Geometrically, we see that the input vector is accepted if it falls on or outside of a hypersphere centered at

  M⁽ⁱ⁾
C⁽ⁱ⁾ =  ——
 

having radius squared

  Γ⁽ⁱ⁾   ‖M⁽ⁱ⁾‖²
[r⁽ⁱ⁾]² =  ——  +  ———— .
  α   (2α)²

Both the center and radius of this hypersphere change as the machine adapts. The performance and optimality of hypersphere-type decision boundaries have been discussed in related work by Glaser[3] and Cooper.[4]

The threshold value, Γᵢ, is adapted so that it increases if the memory becomes a better replica of the signal with the result that γᵢ increases. On the other hand, if the memory is a poor replica of the signal (for example, if it contains noise alone), it is necessary that the threshold decay with time to the point where additional acceptances can modify the memory structure.

The experimental machine is entirely digital in operation and, as stated above, is capable of recovering waveforms of up to 10³ samples in duration. In a typical experiment, one might attempt to recover an unknown noise-perturbed, pseudo-random waveform of up to 10³ bits duration which occurs at random intervals. If no information is available as to the signal waveshape, the adaptive memory is blank at the start of the experiment.

In order to illustrate the operation of the machine most clearly, let us consider a repetitive binary waveform which is composed of 10³ bits of alternate “zeros” and “ones.” A portion of this waveform is shown in Figure 2a. The waveform actually observed is a noise-perturbed version of this waveform shown in Figure 2b at-6 db signal-to-noise ratio. The exact sign of each of the signal bits obviously could not be accurately determined by direct observation of Figure 2b.

(a) Binary signal

(b) Binary signal plus noise

Figure 2—Binary signal with additive noise at-6 db SNR

[Pg 50]

(a)

(b)

(c)

(d)

(e)

Figure 3—Adaption of the memory at-6 db SNR: (a) Blank initial memory; (b) Memory after first dump; (c) Memory after 12 dumps; (d) Memory after 40 dumps; (e) Perfect “checkerboard” memory for comparison

As the machine memory adapts to this noisy input signal, it progresses as shown in Figure 3. The sign of 103 memory components are displayed in a raster pattern in this figure. Figure 3a shows the memory in its blank initial state at the start of the adaption process. Figure 3b shows the memory after the first adaption of the memory. This first “dump” occurred after the threshold had decayed to the point where an energy measurement produced an acceptance decision. Figure 3c [Pg 51] and 3d show the memory after 12 and 40 adaptions, respectively. These dumps, of course, are based on both energy and cross-correlation measurements. As can be seen, the adapted memory after 40 dumps is already quite close to the perfect memory shown by the “checkerboard” pattern of Figure 3c.

The detailed analysis of the performance of this type of machine vs. signal-to-noise ratio, average signal repetition rate, signal duration, and machine parameters is extremely complex. Therefore, it is not appropriate here to detail the results of the analytical and experimental work on the performance of this machine. However, several conclusions of a general nature can be stated.

(a) Because the machine memory is always adapting, there is a relatively high penalty for “false alarms.” False alarms can destroy a perfect memory. Hence, the threshold level needs to be set appropriately high for the memory adaption. If one wishes to detect signal occurrences with more tolerance to false alarms, a separate comparator and threshold level should be used.

(b) The present machine structure, which allows for slowly varying changes in the signal waveshape, exhibits a marked threshold effect in steady-state performance at an input signal-to-noise ratio (peak signal power-to-average noise power ratio) of about -12 db. Below this signal level, the time required for convergence increases very rapidly with decreasing signal level. At higher SNR, convergence to noise-like signals, having good auto-correlation properties, occurs at a satisfactory rate.

A more detailed discussion of performance has been published in the report cited in footnote reference 1.


[Pg 52]

Conceptual Design of Self-Organizing Machines

P. A. Kleyn

Northrop Nortronics
Systems Support Department
Anaheim, California

Self-organization is defined and several examples which motivate this definition are presented. The significance of this definition is explored by comparison with the metrization problem discussed in the companion paper (1) and it is seen that self-organization requires decomposing the space representing the environment. In the absence of a priori knowledge of the environment, the self-organizing machine must resort to a sequence of projections on unit spheres to effect this decomposition. Such a sequence of projections can be provided by repeated use of a nilpotent projection operator (NPO). An analog computer mechanization of one such NPO is discussed and the signal processing behavior of the NPO is presented in detail using the Euclidean geometrical representation of the metrizable topology provided in the companion paper. Self-organizing systems using multiple NPO’s are discussed and current areas of research are identified.

INTRODUCTION

Unlike the companion paper which considers certain questions in depth, this paper presents a survey of the scope of our work in self-organizing systems and is not intended to be profound.

The approach we have followed may be called phenomenological (Figure 1). That is, the desired behavior (self-organization) was defined, represented mathematically, and a mechanism(s) required to yield the postulated behavior was synthesized using mathematical techniques. One advantage of this approach is that it avoids assumptions of uniqueness of the mechanism. Another advantage is that the desired behavior, which is after all the principal objective, is taken as invariant. An obvious disadvantage is the requirement for the aforementioned synthesis technique; fortunately in our case a sufficiently general technique had been developed by the author of the companion paper.

From the foregoing and from the definition of self-organization we employ (see conceptual model), it would appear that our research does [Pg 53] not fit comfortably within any of the well publicized approaches to self-organization (2). Philosophically, we lean toward viewpoints expressed by Ashby (3), (4), Hawkins (5), and Mesarovic (6) but with certain reservations. We have avoided the neural net approach partly because it is receiving considerable attention and also because the brain mechanism need not be the unique way to produce the desired behavior.

Figure 1—Approach used in Nortronics research on self-organizing systems

Nor have we followed the probability computer or statistical decision theory approach exemplified by Braverman (7) because these usually require some sort of preassigned coordinate system (8). Neither will the reader find much indication of formal logic (9) or heuristic (10) programming. Instead, we view a self-organizing system more as a mirror whose appearance reflects the environment rather than its own intrinsic nature. With this viewpoint, a self-organizing system appears very flexible because it possesses few internal constraints which would tend to distort the reflection of the environment and hinder its ability to adapt.

CONCEPTUAL MODEL

Definition

A system is said to be self-organizing if, after observing the input and output of an unknown phenomenon (transfer relation), the system organizes itself into a simulation of the unknown phenomenon.

Implicit in this definition is the requirement that the self-organizing machine (SOM) not possess a preassigned coordinate system. In fact it is just this ability to acquire that coordinate system implicit in the input-output spaces which define the phenomenon that we designate as [Pg 54] self-organization. Thus any a priori information programmed into the SOM by means of, for example, stored or wired programs, constrains the SOM and limits its ability to adapt. We do not mean to suggest that such preprogramming is not useful or desirable; merely that it is inconsistent with the requirement for self-organization. As shown in Figure 2, it is the given portion of the environment which the SOM is to simulate, which via the defining end spaces, furnishes the SOM with all the data it needs to construct the coordinate system intrinsic to those spaces.

The motivation for requiring the ability to simulate as a feature of self-organization stems from the following examples.

Consider the operation of driving an automobile. Figure 3 depicts the relation characterized by a set of inputs; steering, throttle, brakes, transmission, and a set of outputs; the trajectory. Operation of the automobile requires a device (SOM) which for a desired trajectory can furnish those inputs which realize the desired trajectory. In order to provide the proper inputs to the automobile, the SOM must contain a simulation of ⨍⁻¹(x).

Figure 2—Simulation of (a portion of) the environment

Figure 3—Simulation of a relation

Since ⨍(x) is completely defined in terms of the inputs and the resulting trajectories, exposure to them provide the SOM with all the information necessary to simulate ⨍⁻¹(x). And if the SOM possesses internal processes which cause rearrangement of the input-output relation of the SOM to correspond to ⨍⁻¹(x) in accordance with the observed data, the SOM can operate an automobile. It is this internal change which is implied by the term “self-organizing,” but note that the [Pg 55] instructions which specify the desired organization have their source in the environment.

As a second example consider adaptation to the environment. Adapt (from Webster) means: “to change (oneself) so that one’s behavior, attitudes, etc., will conform to new or changed circumstances. Adaptation in biology means a change in structure, function or form that produces better adjustment to the environment.” These statements suggest a simulation because adjustment to the environment implies survival by exposing the organism to the beneficial rather than the inimical effects of the environment. If we represent the environment (or portion thereof) as a relation as shown in Figure 2, we note that the ability to predict what effect a given disturbance will have is due to a simulation of the cause-effect relation which characterizes the environment.

It would be a mistake to infer from these examples that simulation preserves the appearance of the causes and effects which characterize a relation. We clarify this situation by examining a relation and its simulation.

Consider the relation between two mothers and their sons as pictured in Figure 4. Observe that if symbols (points) are substituted for the actual physical objects (mothers and sons), the relation is not altered in any way. This is what we mean by simulation and this is how a SOM simulates. It is not even necessary that the objects, used to display the relation, be defined; i.e., these objects may be primitive. (If this were not so, no mathematical or physical theory could model the environment.) The main prerequisite is sufficient resolution to distinguish the objects from each other.

Figure 4—A relation of objects—displayed and simulated

[Pg 56]

MATHEMATICAL MODEL

The mathematical model must represent both the environment and the SOM and for reasons given in the companion paper each is represented as a metrizable topology. For uniqueness we factor each space into equal parts and represent the environment as the channel

W ⟶ X. (Ref. 10a)

Consider now the SOM to be represented by the cascaded channels

X ⟶ Y ⟶ Z

where X ⟶ Y is a variable which represents the reorganization of the SOM existing input-output relation represented by Y ⟶ Z.

The solution of the three channels-in-cascade problem

W ⟶ X ⟶ Y ⟶ Z,

where p(W) (11), p(X), p(X|W), p(Y), p(Z), p(Z|Y) are fixed, yields that middle channel p₀(Y|X), from a set of permissible middle channels {p(Y|X)}, which maximizes R(Z,W).

Then the resulting middle channel describes that reorganization of the SOM which yields the optimum simulation of W ⟶ X by the SOM, within the constraints upon Ch(Z,Y).

The solution (the middle channel) depends of course on the particular end channels. Obviously the algorithm which is used to find the solution does not. It follows that if some physical process were constrained to carrying out the steps specified by the algorithm, said process would be capable of simulation and would exhibit self-organization.

Although the formal solution to the three-channels-in-cascade problem is not complete, the solution is sufficiently well characterized to permit proceeding with a mechanization of the algorithm. A considerable portion of the solution is concerned with the decomposition and metrization of channels and it is upon this feature that we now focus attention.

As suggested in the companion paper, if the dimensionality of the spaces is greater than one, the SOM has only one method available (12). Consider the decomposition of a space without, for the moment, making the distinction between input and output.

Figure 5 depicts objects represented by a (perhaps multidimensional) “cloud” of points. In the absence of a preassigned coordinate system, [Pg 57] the SOM computes the center of gravity of the cloud (which can be done in any coordinate system) and describes the points in terms of the distance from this center of gravity; or, which is the same, as concentric spheres with origin at the center of gravity.

Figure 5—Nilpotent decomposition of a three-dimensional space

The direction of particular point cannot be specified for there is no reference radius vector. Since the SOM wants to end up with a cartesian coordinate system, it must transform the sphere (a two-dimensional surface) into a plane (a two-dimensional surface). Unfortunately, a sphere is not homeomorphic to a plane; thus the SOM has to decompose the sphere into a cartesian product of a hemisphere (12a) and a denumerable group. The SOM then can transform the hemisphere into a plane. The points projected onto the plane constitute a space of the same character as the one with which the SOM started. Thus, it can repeat all operations on the plane (a space of one less dimension) by finding the center of gravity and the circle upon which the desired point is situated. The circle is similarly decomposed into a line times a denumerable group. By repeating this operation as many times as the space has dimensions, the SOM eventually arrives at a single point and has obtained in the process a description of the space. Since this procedure can be carried on by the repeated use of one operator, this operator is nilpotent and to reflect this fact as well as the use of a projection, we have named this a nilpotent projection operator or NPO for short.

MECHANIZATION OF THE NPO

Analog computer elements were used to simulate one NPO which was tested [Pg 58] in the experimental configuration shown in Figure 6. The NPO operates upon a channel which is artificially generated from the two noise generators i₁ and i₂ and the signal generator i₀ (i₀ may also be a noise generator). The NPO accepts the inputs labelled X₁ and X₂ and provides the three outputs Ξ₁, Ξ₂, and γ. X₁ is the linear combination of the outputs of generators i₁ and i₀, similarly X₂ is obtained from i₂ and i₀.

Figure 6—Experimental test configuration for the simulation of an NPO

Obviously, i₀ is an important parameter since it represents the memory relating the spaces X₁ and X₂. Ξ₁ has the property that the magnitude of its projection on i₀ is a maximum while Ξ₂ to the opposite has a zero projection on i₀. γ is the detected version of the eigenvalue of Ch(X₂,X₁).

In the companion paper it was shown how one can provide a Euclidean geometrical representation of the NPO. This representation is shown in Figure 7 which shows the vectors i₀, i₁, i₂, X₁, X₂, Ξ₁, Ξ₂, and the angles Θ₁, Θ₂, and γ. The length of a vector is given by

|X| = κₓ(2πε)⁻¹ᐟ² ∈ H(X)

and the angle between two vectors by

|Θ(X₁,X₂)|-sin⁻¹ ∈ -R(X₁,X₂).

The three vectors i₀, i₁, i₂ provide an orthogonal coordinate system because the corresponding signals are random, i.e.,

  κ  
R(i₀,i₁,i₂)  ≡  0.

As external observers we have a prior knowledge of this coordinate [Pg 59] system; however, the NPO is given only the vectors X₁ and X₂ in the  i₀ ⨉ i₁  and  i₀ ⨉ i₂ planes. The NPO can reconstruct the entire geometry but the actual output Ξ obviously is constrained to lie in the plane of the input vector X. The following formulas are typical of the relations present.

     |Ξ₁|
tan β  =  ——
     |Ξ₂|
 
cos Θ = cos 2β csc 2γ
 
      cos 2β
cos 2Θ₁  =  -1 + 2  ———
      1-cos 2γ
 

cos Θ = cos Θ₁ cos Θ₂.

Figure 7—Geometry of the NPO

[Pg 60]

Figure 8—NPO run number 5

Figure 9—NPO run number 6

[Pg 61]

We have obtained a complete description of the NPO which involves 74 formulas. These treat the noise in the various outputs, invariances of the NPO and other interesting features. A presentation of these would be outside of the scope of this paper and would tend to obscure the main features of the NPO. Thus, we show here only a typical sample of the computer simulation, Figure 8 and Figure 9. Conditions for these runs are shown in Table I. Run No. 6 duplicates run No. 5 except for the fact that i₁ and i₂ were disabled in run No. 6.

Observe that all our descriptions of the NPO and the space it is to decompose have been time invariant while the signals shown in the simulation are presented as functions of time. The conversion may be effected as follows: Given a measurable (single-valued) function

x = x(t)t ∊ T

where

μ(T) > 0

we define the space

X = {x = x(t) ∍ t ∊ T}

and a probability distribution

  μ(x⁻¹(X′))  
P(X′) =   ————  X′ open ⊂ X
  μ(T)  
 

on that space.

TABLE I
Legend for Traces of Figures 8 and 9
Trace Number 1 2 3 4 5 6 7
Symbol X₂ X₁ γ β i  dξ₂/dτ   dξ₁/dτ
run No. 5
signal  7½ Vrms   7½ Vrms  π ptop    35.6 m cps     
noise 16 Vrms 15 Vrms  π/9 ptop[5]    sine wave    
DC 0 0          
power s/n 1/4 1/4 81/1     0 1/2[6]
terminal value     π/4   π/4        
run No. 6
signal 7½ Vrms 7½ Vrms π ptop   35.6 m cps    
noise 0 0 0[7]   sine wave    
DC -30V 0          
power s/n     0
terminal value     π/4 π/4      
 

[Pg 62]

Then (X,p(X)) is a stochastic space in our usual sense and x(T) is a stochastic variable. Two immediate consequences are:

P(X) is stationary (P(X) is not a function of   t ∊ T), and no question of ergodicity arises.

NETWORKS OF NPO’S

A network of NPO’s may constitute anything from a SOM to a preprogrammed detector, depending upon the relative amount of preprogramming included. Two methods of preprogramming are: (1) Feeding a signal out of a permanent storage into some of the inputs of the network of NPO’s. This a priori copy need not be perfect, because the SOM will measure the angles Θᵢ anyhow. (2) Feedback, which, after all, is just a way of taking advantage of the storage inherent in any delay line. (We implicitly assume that any reasonable physical realization of an NPO will include a delay T between the x input and the ξ output which is not less than perhaps 10⁻¹ times the time constant of the internal feedback loop in the γ computation.)

Simulation of channels that possess a discrete component requires feedback path(s) to generate the required free products of the finitely generated groups. Then, such a SOM converges to a maximal subgroup of the group describing the symmetry of the signal that is a free product available to this SOM.

Because a single NPO with 1 ≤ n₀ ≤ K₀ is isomorphic (provides the same input to output mapping) to a suitable network of NPO’s with n₀ = 1, it suffices to study only networks of NPO’s with n₀ = 1.

Figure 10 is largely self-explanatory. Item a is our schematic symbol for a single NPO with n₀ = 1. Items b, d (including larger feedback loops), and f are typical of artificial intelligence networks. Item c is employed to effect the level changing required in order to apply the three channels in cascade algorithm to the solution of one-dimensional coding problems. Observe that items c and e are the only configurations requiring the γ output. Item d may be used as a limiter by making T⁻¹ high compared to the highest frequency present in the signal. Observe that item e is the only application of NPO’s that requires either the ξ₂ or β outputs. Item f serves the purpose of handling higher power levels into and out of what effectively is a single (larger) NPO. [Pg 63]

Figure 10—Some possible networks of NPO’s

CONCLUSION

The definition of self-organizing behavior suitably represented has permitted the use of Information Theoretic techniques to synthesize a (mathematical) mechanism for a self-organizing machine. Physical mechanization in the form of an NPO has been accomplished and has introduced the experimental phase of the program. From among the many items deserving of further study we may mention: more economical physical mechanization through introduction of modern technology; identification of networks of NPO’s with their group theoretic descriptions; analysis of the dimensionality of tasks which a SOM might be called on to simulate, and prototype SOM applications to related tasks. It is hoped that progress along these lines can be reported in the future.


[Pg 64]

REFERENCES

1. Ścibor-Marchocki, Romuald I.,
  “A Topological Foundation for Self-Organization,”
  Anaheim, California:Northrop Nortronics, NSS Report 2828,
  November 14, 1963
2.

It is true that our definition is very similar to that proposed by Hawkins (reference 5). Compare for example his definition of learning machines (page 31 of reference 5). But the subsequent developments reviewed therein are different from the one we have followed.

3. Ashby, W. R.,
  “The Set Theory of Mechanism and Homeostasis,”
  Technical Report 7, University of Illinois,
  September 1962
4. Ashby, W. R.,
  “Systems and Information,”
  Transactions PTGME MIL-7:94-97
  (April-July, 1963)
5. Hawkins, J. K.,
  “Self-Organizing Systems—A Review and Commentary,”
  Proc. IRE. 49:31-48 (January 1961)
6. Mesarovic, M. D.,
  “On Self Organizational Systems,”
  Spartan Books, pp. 9-36, 1962
7. Braverman, D.,
  “Learning Filters for Optimum Pattern Recognition,”
  PGIT IT-8:280-285 (July 1962)
8.

We make the latter statement despite the fact that we employ a statistical treatment of self-organization. We may predict the performance of, for example, the NPO by using a statistical description, but it does not necessarily follow that the NPO computes statistics.

9. McCulloch, W. S., and Pitts, W.,
  “A Logical Calculus of the Ideas Imminent in Nervous Activity,”
  Bull-Math. Biophys 5:115 (1943)
10. Newell, A., Shaw, J. C., and Simon, H. A.,
  “Empirical Explorations of the Logic Theory Machine:
  A Case Study in Heuristic,”
  Proc. WJCC, pp. 218-230, 1957
10a.

The spaces W, X, Y, and Z are stochastic spaces; that is, each space is defined as the ordered pair (X,p(X)) where p(X) = {p(x) ∋ x ∈ X}, p(x) ≥ 0, x ∈ X and ∫x p(x)dx = 1. Such spaces possess a metrizable topology.

11.

We use the following convention for probability distributions: if the arguments of p( ) are different, they are different functions, thus: p(x) ≠ p(y) even if y = x.

12.

One can prove the existence of a metric directly but in order to perform the metrization the space has to be decomposed first. But decomposing a space without having a metric calls for a neat trick, accomplished (as far as we know) only by the method used by the SOM.

12a.

In this example we use a hemisphere; in general, it would be a spherical cap.


[Pg 65]

A Topological Foundation for
Self-Organization

R. I. Ścibor-Marchocki

Northrop Nortronics
Systems Support Department
Anaheim, California

It is shown that by the use of Information Theory, any metrizable topology may be metrized as an orthogonal Euclidean space (with a random Gaussian probability distribution) times a denumerable random cartesian product of irreducible (wrt direct product) denumerable groups. The necessary algorithm to accomplish this metrization from a statistical basis is presented. If such a basis is unavailable, a certain nilpotent projection operator has to be used instead, as is shown in detail in the companion paper. This operator possesses self-organizing features.

INTRODUCTION

In the companion article[8] we will define a self-organizing system as one which, after observing the input and output of an unknown phenomenon (transfer relation), organizes itself into a simulation of the unknown phenomenon.

Within the mathematical model, the aforementioned phenomenon may be represented as a topological space thus omitting for the moment the (arbitrary) designation of input and output which, as will be shown, bears on the question of uniqueness. Hence, for the purpose of this paper, which emphasizes the mathematical foundation, an intelligent device is taken as one which carries out the task of studying a space and describing it.

In keeping with the policy that one should not ask someone (or something) else to do a task that he could not do himself (at least in principle), let us consider how we would approach such a problem.

In the first place, we have to select the space in which the problem is to be set. The most general space that we feel capable of tackling is a metrizable topology. On the other hand, anything less general would be unnecessarily restrictive. Thus, we choose a metrizable topological space. [Pg 66]

As soon as we have made this choice, we regret it. In order to improve the situation somewhat, we show that there is no (additional) loss of generality in using an orthogonal Euclidean space times[9] a denumerable random cartesian product of irreducible (wrt direct product) denumerable groups.

This paper provides a survey of the problem and a method for solving it which is conceptually clear but not very practical. The companion paper[10] provides a practical method for solving this problem by means of the successive use of a certain nilpotent projection operator.

METRIZATION

We start with a metrizable topological space. There are many equivalent axiomatizations of a metrizable topology; e.g., see Kelley. Perhaps the easiest way to visualize a metrizable topology is to consider that one was given a metric space but that he lost his notes in which the exact form of the metric was written down. Thus one knows that he can do everything that he could in a metric space, if only he can figure out how.

The “figuring out how” is by no means trivial. Here, it will be assumed that a cumulative probability distribution has been obtained on the space by one of the standard methods; bird in cage,[11] Munroe I,[12] Munroe II,[13] ordering (see Halmos[14] or Kelley[15]). This cumulative probability distribution is a function on X onto the interval [0,1] of real numbers. The inverse of this function, which exists by the Radon Nikodym theorem, provides a mapping from the real interval onto the non-trivial portion of X. This mapping induces all of the pleasant properties of the real numbers on the space X: topological, metric, and ordering.

Actually, it turns out that, especially if the dimensionality of the space is greater than one, the foregoing procedure not only provides one metrization, but many. Indeed, this lack of uniqueness is what makes the procedure exceedingly difficult. Only by imposing some additional conditions that result in the existence of a unique solution, does the problem become tractable.

We choose to impose the additional condition that the resulting metric space be a Euclidean geometry with a rectangular coordinate system. [Pg 67]

Even this always does not yield uniqueness, but we will show the additional restriction that will guarantee uniqueness after the necessary language is developed. Since all metrizations of a given metrizable topology are isomorphic, in the quotient class the orthogonal Euclidean geometry serves the purpose of being a convenient representative of the unique element resulting from a given metrizable topology.

Furthermore, the same comment applies to the use of a Gaussian distribution as the probability distribution on this orthogonal Euclidean geometry. Namely, the random Gaussian distribution on an orthogonal Euclidean geometry is a convenient representative member of the equivalence class which maps into one element (stochastic space) of the quotient class.

Information Theory

Now, we will show that Information Theory provides the language necessary to describe the metrization procedure in detail.

It is possible to introduce Information Theory axiomatically by a suitable generalization of the axioms[16] in Feinstein.[17] But to simplify the discussion here, we will use the less elegant but equivalent method of defining certain definite integrals. The probability density distribution p is defined from the cumulative probability distribution P by

P(X′) = ∫X′measurable ⊂ X p(x)dx.(1)

Then the information rate H is defined as

H(X) =  -∫ₓp(x)  ln κ  p(x)dx(2)

where kappa has (carries) the units of X. Finally, the channel rate R is defined as

R (⨀Xᵢ) = Σ H(Xᵢ) - H(X),(3)
    I   I  

where X is the denumerable[18] cartesian product space

X =  Xᵢ.(4)
   I  

[Pg 68]

Next, we define the angle Θ

      -R(⨀Xᵢ)
         I
| Θ (⨀Xᵢ) | sin⁻¹e (5)
     I    

and the norm

|X| = κ(2πe)⁻¹ᐟ² eH(X).(6)

Now, if[19] a statistically independent basis; i.e., one for which

    κ    
R(⨀Xᵢ)    ≡   constant,(7)
I    

can be provided in terms of one-dimensional components; i.e., none of them can be decomposed further, then it is just the usual problem of diagonalization of a symmetric matrix by means of a congruence transformation to provide an orthogonal coordinate system. Furthermore, for uniqueness, we arrange the spectrum in decreasing order. Then, by means of the Radon Nikodym theorem applied to each of these one-dimensional axes, the probability distribution may be made; e.g., Gaussian, if desired. Thus, we obtain the promised orthogonal Euclidean space.

Channel

At this time we can state the remaining additional condition required that a decomposition be unique. The index space I has to be partitioned into exactly two parts, say I′ and I″; i.e.,

I′ ∪ I″ = I(8)

I′ ∩ I″ = φ,   

such that

dim(X′)  =  dim(X″),(9)

where

X′ =  ⨂Xᵢ(10)
   I′
X″ =  ⨂Xᵢ.
   I″

[Pg 69] (If dim (X) is odd, then we have to cheat a little by putting in an extra random dummy dimension.) And then the decomposition of the space

X =  ⨂Xᵢ(11)
   I

has to be carried out so that this partitioning is preserved. Since this partitioning is arbitrary (as far as the mathematics is concerned), it is obvious that a space which is not partitioned will have many (equivalent) decompositions. On the other hand, if the partitioning is into more than two parts, then the existence of a decomposition is not guaranteed.

A slight penalty has to be paid for the use of this partitioning, namely: instead of eventually obtaining a random cartesian product of one-dimensional spaces, we obtain an extended channel (with random input) of single-dimensional channels. It is obvious that if we were to drop the partitioning temporarily, each such single-dimensional channel would be further decomposed into two random components. This decomposition is not unique. But one of these equivalent decompositions is particularly convenient; namely, that decomposition where we take the component out of the original X′ and that which is random to it, say V. This V (as well as the cartesian product of all such V’s, which of necessity are random) is called the linearly additive noise. The name “linearly additive” is justified because it is just the statistical concept isomorphic to the linear addition of vectors in orthogonal Euclidean geometry. (The proof of this last statement is not completed as yet.)

Denumerable Space

The procedure for this decomposition was worded to de-emphasize the possible presence of a denumerable (component of the) space. Such a component may be given outright; otherwise, it results if the space was not simply connected. Any denumerable space is zero dimensional, as may be verified easily from the full information theoretic definition of dimensionality.

The obvious way of disposing of a denumerable space is to use the conventional mapping that converts a Stieltjes to a Lebesque integral, using fixed length segments. (It can be shown that H is invariant under such a mapping.) Unfortunately, while this mapping followed by a repetition of the preceding procedure will always solve a [Pg 70] given problem (no new[20] denumerable component need be generated on the second pass), little insight is provided into the structure of the resulting space. On the other hand, because channels under cascading constitute a group, any such denumerable space is a representation of a denumerable group.

SUMMARY

In summary, the original metrizable topological space was decomposed into an orthogonal Euclidean space times[21] a denumerable random cartesian product of irreducible (wrt direct product) denumerable groups. Thus, since any individual component of a random cartesian product may be studied independently of the others, all that one needs to study is: (1) a Gaussian distribution on a single real axis and (2) the irreducible denumerable groups.

Finally, it should be emphasized that there are only these two ways of decomposing a metrizable topology; (1) if a (statistical) basis is given, use the diagonalization of a symmetric matrix algorithm described earlier (and given in detail in the three channels in cascade problem), and (2) otherwise use a suitable network of the NPO’s with n₀=1. Of course, any hybrid of these two methods may be employed as well.


[Pg 71]

On Functional Neuron Modeling

C. E. Hendrix

Space-General Corporation
El Monte, California

There are two very compelling reasons why mathematical and physical models of the neuron should be built. Model building, while widely used in the physical sciences, has been largely neglected in biology. However, there can be little doubt that building neuron models will increase our understanding of the function of real neurons, if experience in the physical sciences is any guide. Secondly, neuron models are extremely interesting in their own right as new technological devices. Hence, the interest in, and the reason for symposia on self-organizing systems.

We should turn our attention to the properties of real neurons, and see which of them are the most important ones for us to imitate. Obviously, we cannot hope to imitate all the properties of a living neuron, since that would require a complete simulation of a living, metabolizing cell, and a highly specialized one at that; but we can select those functional properties which we feel are the most important, and then try to simulate those.

The most dramatic aspect of neuron function is, of course, the axon discharge. It is this which gives the neuron its “all-or-nothing” character, and it is this which provides it with a means for propagating its output pulses over a distance. Hodgkin and Huxley (1) have developed a very complete description of this action. Their model is certainly without peer in describing the nature of the real neuron.

On the technological side, Cranes’ “neuristors” (2) represent a class of devices which imitate the axonal discharge in a gross sort of way, without all the subtle nuances of the Hodgkin-Huxley model. Crane has shown that neuristors can be combined to yield the various Boolean functions needed in a computer.

However, interesting as such models of the axon are, there is some question as to their importance in the development of self-organizing systems. The pulse generation, “all-or-nothing” part of the axon behavior could just as well be simulated by a “one-shot” trigger circuit. The transmission characteristic of the axon is, after all, only Nature’s way of sending a signal from here to there. It is an [Pg 72] admirable solution to the problem, when one considers that it evolved, and still works, in a bath of salt water. There seems little point, however, in a hardware designer limiting himself in this way, especially if he has an adequate supply of insulated copper wire.

If the transmission characteristic of the axon is deleted, the properties of the neuron which seem to be the most important in the synthesis of self-organizing systems are:

a. The neuron responds to a stimulus with an electrical pulse of standard size and shape. If the stimulus continues, the pulses occur at regular intervals with the rate of occurrence dependent on the intensity of stimulation.

b. There is a threshold of stimulation. If the intensity of the stimulus is below this threshold, the neuron does not fire.

c. The neuron is capable of temporal and spatial integration. Many subthreshold stimuli arriving at the neuron from different sources, or at slightly different times, can add up to a sufficient level to fire the neuron.

d. Some inputs are excitatory, some are inhibitory.

e. There is a refractory period. Once fired, there is a subsequent period during which the neuron cannot be fired again, no matter how large the stimulus. This places an upper limit on the pulse rate of any particular neuron.

f. The neuron can learn. This property is conjectural in living neurons, since it appears that at the present time learning has not been clearly demonstrated in isolated living neurons. However, the learning property is basic to all self-organizing models.

Neuron models with the above characteristics have been built, although none seem to have incorporated all of them in a single model. Harman (3) at Bell Labs has built neuron models which have the characteristics (a) through (e), with which he has built extremely interesting devices which simulate portions of the peripheral neuron system.

Various attempts at learning elements have been made, perhaps best exemplified by those of Widrow (4). These devices are capable of “learning,” but are static, and lack all the temporal characteristics listed in (a) through (e). Such devices can be used to deal with temporal patterns only by a mapping technique, in which a temporal pattern is converted to a spatial one.

Having listed which seem to be the important properties of a neuron, it is possible to synthesize a simple model which has all of them. [Pg 73]

A number of input stimuli are fed to the neuron through a resistive summing network which establishes the threshold and accomplishes spatial integration. The voltage at the summing junction triggers a “one-shot” circuit, which, by its very nature, accomplishes pulse generation and exhibits temporal integration and a refractory period. The polarity of an individual input determines whether it shall be excitatory or inhibitory. This much of the circuitry is very similar to Harmon’s model.

Learning is postulated to take place in the following way: when the neuron fires, an outside influence (the environment, or a “trainer”) determines whether or not the result of firing was desirable or not. If it was desirable, the threshold of the neuron is lowered, making it easier to fire the next time. If the result was not desirable, the threshold is raised, making it more difficult for the neuron to fire the next time.

In a self-organizing system, many model neurons would be interconnected. A “punish-reward” (P-R) signal would be connected to all neurons in common. However, means would be provided for only those which have recently fired to be susceptible to the effects of the P-R signal. Therefore, only those which had taken part in a recent response are modified. This idea is due to Stewart (5), who applies it to his electrochemical devices instead of to an electronic device.

The mechanization of the circuitry is rather straight-forward. A portion of the output of the pulse generator is routed through a “pulse-stretcher” or short-term memory which temporarily records the fact that the neuron has recently fired. The pulse-stretcher output controls a gate, which either accepts or rejects the P-R signal. The P-R signal can take on only three values, a positive level, zero, or a negative level, depending on whether the signal is “punish,” “no action,” or “reward.” Finally, the gate output controls a variable resistor, which is part of the resistive summing network. Figure 1 is a block diagram of the complete model.

Note that this device differs from the usual “Perceptron” configuration in that the threshold resistor is the only variable element, instead of having each input resistor a variable weighting element. This simplification could lead to a situation where, to prepare a specified task, more single-variable neurons would be required than would multivariable ones. This possible disadvantage is partially, at least, offset by the very simple control algorithm which is contained in the design of the model, and is not the matter of great concern which it seems to be for most multivariable models. [Pg 74]

Figure 1—Block diagram of neuron model

Hand simulations of the action of this type of model suggest that a certain amount of randomness would be desirable. It appears that a self-organizing system built of these elements, and of sufficient complexity to be interesting, would have a fair number of recirculating loops, so that spontaneous activity would be maintained in the absence of input stimulus. If this is the case, then randomness could easily be introduced by adding a small amount of noise from a random noise generator to the signal on the P-R bus. Thus, any neurons which spontaneously fire would be continually having their thresholds modified.

The mechanization of the model is not particularly complex, and can be estimated as follows: The one-shot pulse generator would require two transistors, the pulse stretcher one more. The bi-directional gate would require a transistor and at least two diodes.

Several candidates for the electrically-controllable variable resistor are available (6). Particularly good candidates appear to be the “Memistor” or plating cell developed by Widrow (7), the solid state version of it by Vendelin (8), and the “solion” (9). All are electrochemical devices in which the resistance between two terminals is controlled by the net charge flow through a third terminal. All are adaptable to this particular circuit.

Of the three, however, the solion appears at first glance to have the most promise in that its resistance is of the order of a few thousand ohms (rather than the few ohms of the plating cells) which is more compatible with ordinary solid-state circuitry. Solions have the disadvantage that they can stand only very low voltages (less than 1 volt) and in their present form require extra bias potentials. If these difficulties can be overcome, they offer considerable promise. [Pg 75]

In summary, it appears that a rather simple neuron model can be built which can mimic most of the important functions of real neurons. A system built of these could be punished or rewarded by an observer, so that it could be trained to give specified responses to specified stimuli. In some cases, the observer could be simply the environment, so that the system would learn directly from experience, and would be therefore a self-organizing system.

REFERENCES

1. Hodgkin, A. L., and Huxley, A. L.,
  “A Quantitative Description of Membrane Current and its Application to Conduction and Excitation in Nerve,”
  J. Physiol. 117:500-544 (August 1952)
2. Crane, H. D.,
  “Neuristor—A Novel Device and System Concept,”
  Proc. IRE 50:2048-2060 (Oct. 1962)
3. Harmon, L. D., Levinson, J., and Van Bergeijk, W. A.,
  “Analog Models of Neural Mechanism,”
  IRE Trans. on Information Theory IT-8:107-112
  (Feb. 1962)
4. Widrow, B., and Hoff, M. E.,
  “Adaptive Switching Circuits,”
  Stanford Electronics Lab Tech Report 1553-1, June 1960
5. Stewart, R. M.,
  “Electrochemical Wave Interactions and Extensive Field Effects in Excitable Cellular Structures,”
  First Pasadena Invitational Symposium on Self-Organizing Systems,
  Calif. Institute of Technology, Pasadena, Calif., 14 Nov. 1963
6. Nagy, G.,
  “A Survey of Analog Memory Devices,”
  IEEE Trans. on Electronic Cmptrs. EC-12:388-393 (Aug. 1963)
7. Widrow, B.,
  “An Adaptive Adaline Neuron Using Chemical Memistors,”
  Stanford Electronics Lab Tech Report 1553-2, Oct. 1960
8. Vendelin, G. D.,
  “A Solid State Adaptive Component,”
  Stanford Electronics Lab Tech Report 1853-1, Jan. 1963
9. “Solion Principles of Electrochemistry and Low-Power Electrochemical Devices,”
  Dept. of Comm., Office of Tech. Serv. PB 131931
  (U. S. Naval Ord. Lab., Silver Spring, Md., Aug. 1958)

[Pg 76]

Selection of Parameters for
Neural Net Simulations[22]

R. K. Overton

Autonetics Research Center
Anaheim, California

Research of high quality has been presented at this Symposium. Of particular interest to me were the reports of the Aeronutronic group and the Librascope group. The Aeronutronic group was commendably systematic in its investigations of different arrangements of linear threshold elements, and the Librascope data, presenting the effects of attaching different values to the parameters of simulated neurons, are both systematic and interesting.

Unfortunately, however, interest in such research can obscure a more fundamental question which seems to merit study. That question concerns the parameters, or attributes, which describe the simulated neuron. Specifically, which parameters or attributes should be selected for simulation? (For example, should a period of supernormal sensitivity be simulated following an absolutely refractory period?)

Some selection obviously has to be made. Librascope, which is trying to simulate neurons more or less faithfully, plans to build a net of ten simulated neurons. In contrast, General Dynamics/Fort Worth, with roughly the same degree of effort, is working with 3900 unfaithfully-simulated neurons. This comparison is not a criticism of either group; the Librascope team has simply selected many more parameters for simulation than has the General Dynamics group. Each can make the selections it prefers, because the parameters of real neurons which are necessary and sufficient for learning have not been exhaustively identified.

From the point of view of one whose interests include real neurons, this lack of identification is unfortunate. I once wrote a book which included some guesses about the essential attributes of neurons. Since that time, many neuron simulation programs have been written. But these programs, although interesting and worthwhile in their own right, have done little to answer the question of the necessary parameters. That is, they do not make much better guesses possible. And yet better guesses would also make for more “intelligent” machines.


[Pg 77]

INDEX OF INVITED PARTICIPANTS

MICHAEL ARBIB Massachusetts Institute of Technology
ROBERT H. ASENDORF Hughes Research Laboratories/ Malibu
J. A. DALY Astropower/Newport Beach
GEORGE DeFLORIO System Development Corp./Santa Monica
DEREK H. FENDER California Institute of Technology
LEONARD FRIEDMAN Space Technology Labs./Redondo Beach
JAMES EMMETT GARVEY ONR/Pasadena
THOMAS L. GRETTENBERG California Institute of Technology
HAROLD HAMILTON Librascope/Glendale
JOSEPH HAWKINS Aeronutronic/Newport Beach
CHARLES HENDRIX Space-General Corp./El Monte
R. D. JOSEPH Astropower/Newport Beach
PETER A. KLEYN Nortronics/Anaheim
JOHN KUHN Space-General Corp./El Monte
FRANK LEHAN Space-General Corp./El Monte
EDWIN LEWIS Librascope/Glendale
PETER C. LOCKEMANN California Institute of Technology
GILBERT D. McCANN California Institute of Technology [Pg 78]
C. J. MUNCIE Aeronutronic/Newport Beach
C. OVERMIER Nortronics/Anaheim
RICHARD K. OVERTON Autonetics/Anaheim
DIANE RAMSEY Astropower/Newport Beach
RICHARD REISS Librascope/Glendale
R. I. ŚCIBOR-MARCHOCKI Nortronics/Anaheim
JAMES J. SPILKER Philco/Palo Alto
ROBERT M. STEWART Space-General Corp./El Monte
HENNIG STIEVE California Institute of Technology
RICHARD TEW Space-General Corp./El Monte
JOHN THORSEN University of California/Los Angeles
RICHARD VINETZ Librascope/Glendale
CHRISTOPH von CAMPENHAUSEN California Institute of Technology
DAVID VOWLES California Institute of Technology
HORST WOLF Astropower/Newport Beach

U.S. GOVERNMENT PRINTING OFFICE: 1966 O—205-502


Footnotes:

[1] For review articles see: Lillie (13), Franck (6).

[2] The operation of this machine is described in substantially greater detail in J. J. Spilker, Jr., D. D. Luby, R. D. Lawhorn, “Adaptive Binary Waveform Detection,” Philco Western Development Laboratories, Communication Sciences Department, TR #75, December 1963.

[3] F. M. Glaser, “Signal Detection by Adaptive Filters,” IRE Trans. Information Theory, pp. 87-90; April 1961.

[4] P. W. Cooper, “The Hypersphere in Pattern Recognition,” Information and Control, pp. 324-346; December 1962.

[5] Observed from Oscillogram

[6] Computed

[7] Observed from Oscillogram

[8] Kleyn, P. A., “Conceptual Design of Self-Organizing Machines,” Anaheim, California:Northrop Nortronics, NSS Report 2832, Nov. 14, 1963.

[9] Random cartesian product.

[10] Kleyn, P. A., “Conceptual Design of Self-Organizing Machines,” Anaheim, California:Northrop Nortronics, NSS Report 2832, Nov. 14, 1963.

[11] Harman, W. W., “Principles of the Statistical Theory of Communication,” New York, New York:McGraw-Hill, 1963.

[12] Munroe, M. E., “Introduction to Measure and Integration,” Cambridge, Mass.:Addison-Wesley, 1953.

[13] Munroe, M. E., “Introduction to Measure and Integration,” Cambridge, Mass.:Addison-Wesley, 1953.

[14] Halmos, P. R., “Measure Theory,” Princeton, New Jersey:D. Van Nostrand Co., Inc., 1950.

[15] Kelley, J. L., “General Topology,” Princeton, New Jersey:D. Van Nostrand Co., Inc., 1955.

[16] Feinstein uses his axioms only in finite space X; i.e., card(X) < K₀.

[17] Feinstein, A., “Foundations of Information Theory,” New York, New York: McGraw-Hill, 1958.

[18] If I is infinite, certain precautions have to be exercised.

[19] This “if” is the catch that makes all methods of metrization of a space of dimensionality higher than one impractical, except the method of successive projections upon unit spheres centered at the center of gravity. The method of using that nilpotent projection operator is described in the companion paper(see footnote page 65).

[20] Only non-cyclic irreducible (wrt direct product) denumerable group components of the old denumerable space will remain.

[21] Random cartesian product.

[22] This paper, submitted after the Symposium, represents a more detailed presentation of some of the issues raised in the discussion sessions at the Symposium and hence, constitutes a worthwhile addition to the Proceedings.

Transcriber’s Notes:


The illustrations have been moved so that they do not break up paragraphs and so that they are next to the text they illustrate.

Typographical and punctuation errors have been silently corrected.

A heavy bar on top of a letter indicates a vector, e.g. M means “the vector M”.