NEURAL NETWORKS AND MUSIC 1 Running Head ... - … Awards, 2012/cognitive scie… · NEURAL...
Transcript of NEURAL NETWORKS AND MUSIC 1 Running Head ... - … Awards, 2012/cognitive scie… · NEURAL...
NEURAL NETWORKS AND MUSIC 1
Running Head: NEURAL NETWORKS AND MUSIC
A Modern View on the “Third Culture” Movement:
Neural Networks and Music
Abigail L. Kleinsmith
State University of New York at Oswego
NEURAL NETWORKS AND MUSIC 2
Introduction
In 1959, novelist Charles P. Snow delivered an extremely influential lecture entitled “The
Two Cultures and the Scientific Revolution”, in Cambridge, Massachusetts. He published a
paper about this same topic three years earlier, but his lecture is what propelled his ideas into the
realm of common knowledge. He laments over what appeared to be an insurmountable divide
between the “two cultures”: those of the sciences and the humanities (Graves, 1971). According
to Snow, artists believe that scientists are “shallowly optimistic” and “unaware of man‟s
condition” (Graves, 1971), while scientists think that artists are “totally lacking in foresight” and
are “in a deep sense, anti-intellectual” (Graves, 1971). He proposed an innovative solution in the
form of a “third culture”, which would not only breach the gap, but also revel in an enhanced and
more comprehensive view of our world (Lehrer, 2007).
Unfortunately, this grand vision has yet to be actualized (Lehrer, 2007). There is still a
large divide between the humanities and the scientists – which is not surprising. As an individual
with deep interests in both of these „cultures‟, it seems pertinent to attempt to find some common
ground in my own fashion. As I am currently engaged in musical performance, and the study of
neural networks and cognitive psychology, I feel compelled to contribute a paper in the form of a
discussion that will attempt to illuminate the striking similarities between these fields of study.
This paper will present a preliminary investigation into the convergence of two fields of
human interest, namely computational modeling and music. I hope that this examination will
provide supporting evidence for a neural network‟s ability to simulate and learn about aspects of
music in a practical and applicable way. In this way, it will consequently present a practical (if
micro) application of C. P. Snow‟s “third culture” theory.
NEURAL NETWORKS AND MUSIC 3
Music and Cognition
Music is something which is inextricably bound to the development and evolution of the
human race. Some professionals have openly disagreed with the power and prevalence of music;
Steven Pinker has famously referred to music as “auditory cheesecake”, meaning that it is „nice
to have‟, but is ultimately unnecessary (Levitin, 2007). Many other individuals disagree with
Pinker, such as Dan Levitin of McGill University in Canada. Author of “This is Your Brain on
Music”, Levitin firmly believes that music is something that evolved simultaneously with the
language of our species, and is in fact directly attributable to our higher development (Levitin,
2007).
We have yet to discover a culture which lacks some form of music. Donald Brown‟s
book entitled “Human Universals” presents an extensive list of attributes which he believes to be
widespread across cultures, one of which is music. There have been updates and additions to his
book since its publication in 1991, but there remains a rather large section dedicated to music. He
lists such universals as: children‟s music, musical redundancy, music as seen in art, musical
variation, music as related to social functions, and music as a religious activity (Brown, 1991).
This example demonstrates the apparent importance of music from a cultural perspective.
In order to empirically discuss the issue at hand, we must address music‟s effect on the
human brain. The fact that musical activities tend to activate an area in nearly every part of the
brain lends support to the idea that music has developed with our species. If a creative construct
has the power to simultaneously activate various cortical areas, it could help to develop stronger
pervasive bonds between the activated neurons. Recent topics documented by William F.
Thompson in his book “Music, Thought, and Feeling: Understanding the Psychology of Music”
NEURAL NETWORKS AND MUSIC 4
include the emotional effects of music on the human limbic and cortical systems, and the ability
of pleasant music to activate parietal, frontal, and temporal lobes (Thompson, 2009).
In concordance with emotional aspects of music cognition, I performed a study in 2009
that examined the potential of musical key and tempo to alter a person‟s affect. I recorded one
piece of music played in four different ways (major key/fast tempo, major key/slow tempo,
minor key/fast tempo, minor key/slow tempo); participants listened to one of the four pieces
while they read an intentionally ambiguous story. Participants were then asked questions about
both the character‟s mood and state of being. Interestingly, there was a high correlation between
key and projected affect; participants overwhelmingly perceived the character as being happy
when they heard the song in a major key, and sad when they heard the song in a minor key
(ANOVA, p = 0.006) (Kleinsmith, 2009). Another notable fact is that only two of the individuals
surveyed had any kind of musical training, formal or otherwise (N = 24). This seems to imply
that there is something implicit about music‟s ability to alter our affective state, and this concept
aligns itself with some of the documented ideas of Charles Darwin.
Darwin‟s evolutionary theory helps to support the fact that music has an adaptive
component. The survival of a species can be contingent upon their ability to engage in group
cohesion and cooperation (Thompson, 2009). Music can help to synchronously engage
individuals in a rhythmic activity such as marching, clapping, or drumming. If a group of people
forms an interconnected unit, they have greater chances for survival. A disconnected and chaotic
group will not achieve the same level of performance (Thompson, 2009). As such, one can
theorize that groups who engage in synchronous and rhythmic activities may increase their
chances of survival.
NEURAL NETWORKS AND MUSIC 5
The study of music‟s effect on the human brain is a developing field, due in part to the
emergence of cognitive science as a reliable field of study. Groups focusing their efforts on the
interaction of music and the brain, such as The Society for Music Perception and Cognition
(SMPC), have been founded recently (1990). The SMPC focuses its efforts on expanding our
knowledge about music by studying it empirically from various angles. It is obvious that this is a
burgeoning field of study which has the potential to actualize C. P. Snow‟s “third culture”
theory.
General concepts of neural networks
Computational modeling is a useful tool for discussing human cognition, and artificial
neural networks are one of the more common types of modeling discussed in the literature.
Barbara Tillman provides a nice summary of the purpose of neural networks; she says that “the
goal of artificial networks is not to describe neural anatomy and physiology, but to be founded
on neural principles in order to simulate different levels of perceptual and cognitive processing”
(Peretz & Zatorre, 2003). In my opinion, the degree to which a neural network simulates varying
levels of processing defines its usefulness. Additionally, there needs to be a “biologically
realistic” aspect to the simulation because if a simulation‟s output is not applicable to a real
situation, then its usefulness drastically decreases.
Connectionism is also another topic which is critically linked to both artificial neural
networks and the issues presented in this paper. Munakata and O‟Reilly discuss the idea of
connectionism in their textbook about computational modeling, and indicate that it is also known
as backpropagation (O‟Reilly & Munakata, 2000). Backpropagation can function as a
mechanism which identifies and attempts to correct errors within a network by adjusting specific
weights to fit a constraint. The importance of interconnectivity cannot be ignored as it is a basic
NEURAL NETWORKS AND MUSIC 6
tenet of neural networks. The interaction of different nodes in a network is what ultimately
produces an output far greater than what the nodes would have initially been able to produce
individually.
The two neural networks which will be presented in this current paper deal with self-
organizing maps (SOMs), or the Kohonen algorithm. The Kohonen algorithm has also been used
to identify competitive types of neural networks, and is associated with unsupervised learning
(O‟Reilly & Munakata, 2000). Consequently, these networks are also associated with Hebbian
learning. The SOM is a computational tool for analyzing a network‟s output. It is also a tool that
acts in a reductionist manner and condenses high-dimensional data into a manageable two-
dimensional form (Toiviainen, 1996). A SOM represents multiple relations between data as well;
the proximity of one node to another can indicate that the two are similar. The Kohonen
algorithm provides a concise and effective way of discussing algorithmic musical networks.
Artificial neural networks of music
This discussion of neural networks and music could benefit from a brief discussion of
temporality. Jeffery Elman‟s groundbreaking paper “Finding Structure in Time”, written in 1990,
discussed the importance of temporal aspects in relation to cognitive science, and there appears
to be no human cognition which relies more fundamentally upon temporality than music. For
Elman, “time is inextricably bound up with many behaviors which express themselves as
temporal sequences. Indeed, it is difficult to know how one might deal with such basic problems
as goal-directed behavior, planning, or causation without some way of representing time”
(Elman, 1990). It becomes apparent that studies of music cognition have an important place in
the realm of cognitive studies because time is intimately connected with music. For example, it
NEURAL NETWORKS AND MUSIC 7
would be quite impossible to discuss the erratic melodic lines of Schoenberg‟s string quartets
without discussing the concept of time.
Viewing music from this cognitive perspective allows us to create and solidify a
foundation for the purpose of discussing both fields on the same plane. It is important to have a
directed reason for discussing this unconventional application of neural networks. As this paper
will demonstrate, not only do artificial neural networks of music have an important place in the
study of the brain, but they can highlight and enhance scientists appreciation of the arts, and
consequently contribute to the development of a true “third culture”.
While artificial neural networks have long been used to demonstrate and describe a wide
variety of cognitive tasks and disorders, their use for the modeling of music is an area which is
just currently emerging as a useful intellectual tool. While some individuals, such as Petri
Toiviainen (who is a professor of musicology at the University of Jyväskylä in Finland), have
focused their academic research careers almost solely upon the abilities of neural networks to
model music cognition, it is a very specific area of study and is therefore underrepresented in the
literature. An individual studying both of the seemingly disparate fields – and their interaction –
appears to require a great deal of knowledge in more than one area; this may be another
contributing factor the lack of available empirical research about the topic. It now seems
pertinent to discuss some of Toiviainen‟s research, as it seems to provide a beautiful affirmation
of the fact that artificial neural networks can accurately and effectively model aspects of music.
One of Toiviainen‟s studies, he extrapolates on Carol Krumhansl‟s studies on tonal
hierarchy within the Western twelve-tone chromatic scale (see Figure 1 below, “C” is repeated).
He utilizes a neural network which has been designed to recognize and classify notes within
bebop-style jazz improvisation (Toiviainen, 1996). Improvisation is of particular interest to
NEURAL NETWORKS AND MUSIC 8
cognitive musicologists because of its inherent random nature. When musicians engage in
improvisation, they are operating under varying types and degrees of constraints (i.e., key of the
piece, tempo, musical meter, etc.). A computer simulation can model this by utilizing a „chaotic‟
element that could randomly distribute weight values across the network (Toiviainen, 1996).
Figure 1: Twelve-tone Western chromatic scale, C major.
Toiviainen was particularly interested in the ability of a neural network to establish and
detect a sense of tonal hierarchy within the music, through the use of statistics. A tonal hierarchy
can be defined as a human‟s intentional ordering of notes in order of their importance to the
structure of the scale (Toiviainen, 1996). A tone may be perceived as more “important” than
another if it is a critical place-holder in terms of the scale itself. For example in the key of C
major, the C is the most critical to the overall structure and development of the scale. It both
begins and ends the scale, and is the tone with which the scale is associated (see Figure 2 below).
The notes E and G are perceived as being the next most critical tones because of their importance
to the key‟s triad (see Figure 3 below). The triad is a chord which defines the key in an
identifiable way (Toiviainen, 1996).
Figure 2: C Major scale Figure 3: C Major Triad
NEURAL NETWORKS AND MUSIC 9
It is interesting to note that many individuals will “pick up on” or sense these critical
structural architectures, even without any general knowledge of music theory. As demonstrated
earlier with my 2009 study, individuals who have absolutely no formal musical training will
respond to changes in musical key and demonstrate a change in affect as a direct result of the
music and the manipulation of independent variables. I believe that the tonal hierarchy within a
C major triad would be immediately identified and conceptually understood upon hearing an
auditory example.
Keeping this in mind, we can move on to an exploration of Toiviainen‟s findings with
this particular neural network. The results of Krumhansl‟s study (performed with actual human
listeners and not a neural network) can be seen in Figure 4; she demonstrated that there are very
definite perceived differences between the tones in a chromatic Western scale. The same pattern
is demonstrated in an imitative study executed by Järvinen and colleagues in 1995 (see Figure 5);
this time, the pattern is displayed in relation to the frequency with which the tones were present
in a set of fifty-six improvised samples of bebop-jazz. It would appear that there is a correlation
between the perceived importance of a note within the structure of a scale, and the frequency
with which it is heard in the actual music (Toiviainen, 1996).
Figure 4: Tonal hierarchy (Krumhansl, 1990)
NEURAL NETWORKS AND MUSIC 10
Figure 5: Tonal frequency within improvisational jazz (Järvinen, 1995)
This pattern of implied importance is replicated consistently with Toiviainen‟s artificial
neural network. Figure 6 demonstrates the architecture of the network model. The network itself
has three true sources of input, although the architecture appears to indicate that there are more.
They are as follows:
1. (C) represents the Context Input and indicates information about the Present Chords
(PC) and the Following Chords (FC) being presented to the network.
2. (F) represents Feedback Input and can be thought of as a primitive type of short-term
memory, joining together musical patterns.
3. (E) represents External Input and accounts for the randomness of improvisation by
adding variation to the output patterns. (Toiviainen, 1996).
NEURAL NETWORKS AND MUSIC 11
Figure 6: Architecture of Toiviainen‟s neural network (1996).
The network was trained to recognize jazz melodies as they were presented to it; Figure 7
demonstrates the network‟s representation of an example melody. Toiviainen makes explicit
reference to the occurrence of Hebbian learning in this particular neural network when he
discusses how the network was trained to learn melodies. They were learned “by strengthening
the connections between the active neurons of the auto-associator” (Toiviainen, 1996). This idea
is synchronous with concepts relevant to more general types of neural networks, and this point
serves to highlight the ability of neural networks to simulate musical concepts.
Figure 7: The network‟s representation of a melody
NEURAL NETWORKS AND MUSIC 12
The results of the training sequence are clearly correlated with Krumhansl‟s findings.
Toiviainen describes his network as a connectionist model, which truly highlights the idea of
emergence in relation to the hierarchical structure of a chromatic music scale. Figure 8 indicates
the frequency of occurrence of certain notes within the network. The white nodes indicate the
input (learning) phase, and the black nodes indicate the output (production) phase.
Figure 8: Input and Output of Toiviainen‟s neural network (1996)
The network was similarly trained and run in concordance with different constraints set
by Toiviainen and his colleagues. Using the same basic network architecture, he analyzed the
frequency of occurrence of the individual chords of the chromatic C major scale. Each note of
the scale can be the root note (first note) of a chord; consequently, there are twelve chords
represented in Figure 9, which demonstrates the difference between the input/learning phase
(white nodes) and the output/production phase (black nodes) in the form of a graph (Toiviainen,
1996).
Toiviainen indicates that there is a clear tendency of the network to emphasize (or de-
emphasize) notes in the scale depending on their tonal function (Toiviainen, 1996). Music that
utilizes a greater number of “unimportant” / infrequent tones (such as C# and G#, as seen in
NEURAL NETWORKS AND MUSIC 13
Figure 9) is frequently perceived as being abstract, obscure, or unpleasant. This is due to the
general architecture of the auditory cortex, in addition to the listener‟s expectations. Some
musical artists intentionally do this in order to elicit a desired emotion or reaction in his or her
listening audience (Lehrer, 2007).
Figure 9: Frequency of tonal occurrences in an output of the neural network.
Toiviainen‟s goals for this study were to present and evaluate the output of a network
associated with tonal hierarchy, as well as to critically evaluate the output and suggest means of
improvement. He lists ways in which to improve the methodology and help to more accurately
model music in a scientific way, which are not critically relevant to the current investigation.
This neural network was one of his less complex demonstrations, making it feasible to discuss
within the context of this overview (Toiviainen, 1996).
In contrast to the first example of a musical neural network, I would now like to present a
recurrent connectionist network which is designed to perform music composition. The network is
called CONCERT (an acronym for CONnectionist Composer of ERudite Tunes), and it was
developed by Michael C. Mozer of the University of Colorado at Boulder (Griffith & Todd,
1999). This network composes music solely by imitation and prediction based upon the training
NEURAL NETWORKS AND MUSIC 14
sequences to which it is exposed. It incorporates the three musical aspects of pitch, note duration,
and harmonic structure into its composition. CONCERT was trained using multiple sets of both
Johann Sebastian Bach pieces and traditional European fold melodies (Griffith & Todd, 1999). I
now will briefly explain the network‟s basic architecture in addition to its methods of learning
and its process of composition.
CONCERT‟s architecture is composed of various levels of layers and is very clearly
recurrent. Not surprisingly, its architecture is similar to that of Toiviainen‟s; they are both the
same type of neural network, making use of the Kohonen algorithm. A melody is presented to
the network in a note-by-note fashion, so the input node in this network could be represented by
the “Current Note” node. Mozer indicates that the “Context” node of the network acts as a layer
which “can represent relevant aspects of the input history, that is, the temporal context in which a
prediction is made” (Griffith & Todd, 1999). This layer acts as a form of short-term memory for
the network.
The information flows from the “Context” node to the next two nodes, which are “Next
Note Distributed (NDD)” and “Next Note Local (NNL)”. They both incorporate and represent
the three aspects of music listed previously in this review: pitch, duration, and harmonic
structure. Mozer indicates that these layers “contain CONCERT‟s internal representation of the
note” (Griffith & Todd, 1999). Finally, the prediction of the next note is represented in the output
layer of the network, labeled as the “Note Selector” node (Griffith & Todd, 1999). A pictorial
representation of CONCERT‟s architecture can be seen in Figure 10 below.
NEURAL NETWORKS AND MUSIC 15
Figure 10: CONCERT‟s architecture
With the network‟s architecture in mind, it is important to discuss its method of
composition. Mozer refers to its compositional technique as “algorithmic music composition”
(Griffith & Todd, 1999). He defines this as the network‟s ability to select notes in a sequential
and logical order according to a specific and pre-programmed table. This table gives a numerical
representation of the probability of one note to transition into another, and is cited multiple times
in Griffith and Todd‟s discussion of Mozer‟s network (Griffith & Todd, 1999). This transitional
probability seems similar to the work done by Krumhansl which was discussed earlier in this
paper, although she is not attributed to the development of this table. According to Mozer, it is
possible to manipulate and develop individual transitional tables in order to exemplify specific
musical styles. The one which Mozer utilized for the development of CONCERT is apparently
based upon the transitional probabilities found within traditional European folk music (Griffith &
Todd, 1999).
NEURAL NETWORKS AND MUSIC 16
Finally, I will discuss the way in which CONCERT is trained. During the learning
process, the Kohonen algorithm is utilized to present one note at a time to the network The way
in which the network composes music is almost identical to the way in which it is trained, which
is not surprising. The network composes one note at a time, basing its next „decision‟ upon the
previous note in the sequence. Mozer also indicates that CONCERT uses a form of
backpropagation, which is evident upon simply looking at the architecture (Griffith & Todd,
1999).
On a final note, it is interesting that Mozer appears to have experienced some degree of
success with CONCERT. He indicates that music is easier to simulate with a neural network than
natural language; music‟s finite grammar in conjunction with its “psychoacoustic and stylistic
regularities” (Griffith & Todd, 1999) make it relatively easy to model.
Further Considerations
This paper only briefly touches upon the applications of neural networks to the modeling
of music cognition. As a result, there are many ideas that still need to be addressed with respect
to the combination of the sciences and the humanities. One issue with combining neural science
and music is a lack of global coherence, mentioned in the discussion of CONCERT. Composers
of music can be influenced by an infinite number of things, and individual experience and
cultural perceptions are perhaps the two most prominent. Neural networks have no true
„individual experience‟, except for the training sequences to which they are subjected. “The
difficulty is in deriving this knowledge in an explicit form: even human composers are unaware
of many of the constraints under which they operate” (Griffith & Todd, 1999).
Although it is possible to imitate a sense of randomness with a chaotic weight in a
network, it is strange to imagine a neural network improvising in a cool jazz style, such as that of
NEURAL NETWORKS AND MUSIC 17
Miles Davis. As has been demonstrated with the help of Munakata and O‟Reilly, it is important
to remember that neural networks are one tool of many that we can use to analyze our human
experience.
As an amateur musician myself, it is slightly strange to imagine a neural network
composing pieces of music which could attain such popularity as Beethoven‟s great symphonies.
However, this intellectual issue could potentially come to the forefront of the modern musical
community. As our culture progresses into the twenty-first century at an alarming speed, our
technologies advance at a similar rate. Perhaps new methods of computational processing will be
developed in which music can be composed creatively, as opposed to in a merely imitational
way.
I believe that in the coming years, C. P. Snow‟s “third culture” will become more of a
reality than he had ever imagined. Even in my own personal experience, I am noticing the
importance of a multifaceted background, both academically and otherwise. In today‟s
interconnected global community, it is critical that individuals be able to make connections that
span more than one field of interest or study. Theoretically, persons with multidimensional
backgrounds will move to the forefront of the intellectual community and help to create a
foundation of understanding between the scientists and the artists of the twenty-first century. It
is my hope that this paper illuminates one small way in which the gap between the sciences and
the humanities can be bridged, and potentially even closed sometime in the near future.
NEURAL NETWORKS AND MUSIC 18
References
Brown, D. E. (1991). Human universals. Philadelphia: Temple University Press.
Elman, J. L. (1990). Finding Structure in Time. Cognitive Science, 14, 179-211.
Graves, N. C. (1971). The two culture theory in C. P. Snow's novels. Hattiesburg: University and
College Press of Mississippi.
Griffith, N., & Todd, P. M. (1999). Musical networks: Parallel distributed perception and
performance. Cambridge, Mass: MIT Press.
Kleinsmith, A. (2009). Effects of musical properties upon emotion perception: A study in
psychoacoustics. Oswego, NY: Unpublished manuscript.
Lehrer, J. (2007). Proust was a neuroscientist. Boston: Houghton Mifflin Co.
Levitin, D. J. (2007). This is your brain on music: The science of a human obsession. New York:
Plume.
O'Reilly, R. C., & Munakata, Y. (2000). Computational explorations in cognitive neuroscience:
Understanding the mind by simulating the brain. Cambridge, Mass: MIT Press.
Peretz, I., & Zatorre, R. J. (2003). The cognitive neuroscience of music. Oxford: Oxford
University Press.
Thompson, W. F. (2009). Music, thought, and feeling: Understanding the psychology of music.
Oxford: Oxford University Press.
Toiviainen, P. (1996). Modelling musical cognition with artificial neural networks. Jyv skyl :
University of Jyv skyl .