Visualisation of Music (VisualBox)

18
VisualBox An OpenGL Music Visualiser Alexander Conrad Stevens 41719882 Visualization, Computer Graphics & Data Analysis Computer Graphics Project April 2012.

description

A simple implementation of a music visualizer using Python and OpenGL

Transcript of Visualisation of Music (VisualBox)

  • VisualBoxAn OpenGL Music Visualiser

    Alexander Conrad Stevens41719882

    Visualization, Computer Graphics & Data AnalysisComputer Graphics Project

    April 2012.

  • Contents

    1 Project Overview 1

    1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

    1.2 Aim of Project . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

    2 The VisualBox Platform 2

    2.1 The Development Environment . . . . . . . . . . . . . . . . . . . . . 2

    2.2 Sampling and Playing the Sound . . . . . . . . . . . . . . . . . . . . 2

    2.3 The Graphics Library . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

    2.4 Miscellaneous Python Libraries . . . . . . . . . . . . . . . . . . . . . 3

    3 Design of Visualisation 4

    3.1 Inspiration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

    3.2 Design Direction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

    4 Implementation 6

    4.1 GStreamer: Playing and Decoding . . . . . . . . . . . . . . . . . . . . 6

    4.2 NumPy and the Fast Fourier Transform . . . . . . . . . . . . . . . . 7

    4.3 OpenGL, GLU and GLUT . . . . . . . . . . . . . . . . . . . . . . . . 9

    5 Results and Conclusions 12

    5.1 Result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

    5.2 What Could Be Improved? . . . . . . . . . . . . . . . . . . . . . . . . 14

    5.3 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

    Appendices 15

    A Program listings 16

    ii

  • Chapter 1

    Project Overview

    1.1 Introduction

    Stimulation through sound is such a large industry in todays culture, that there are

    many methods of satisfying the desire for audible stimulation. Many such ways to

    satisfy these desires, include playing an instrument, playing music through a device

    or attending live concerts. Many of these surely do stimulate the auditory senses

    of the subject, but they do not generally stimulate the visual senses. This is where

    VisualBox and many music visualisers fill in the gap.

    1.2 Aim of Project

    The general idea of a music visualiser is to take the music input frequency, level,

    tempo, etc. and convert it to a visual representation of the sound. This can

    be completed in 2D or 3D; however, the focus of this project will be 3D visualisa-

    tions. So the aim for the project is to develop a 3D visualisation that takes at least

    frequency and level samples from the provided music, and convert them into an

    aesthetically pleasing and entertaining 3D animation. The 3D world does not need

    to be interactive, as the music being played through the visualiser can be considered

    the interactive medium. Though, the visual representation needs to clearly show to

    the subject that manipulation of the world is directed by the music.

    1

  • Chapter 2

    The VisualBox Platform

    2.1 The Development Environment

    For simplicity and prior knowledge and experience in development on the Ubuntu

    operating system, it was decided that the beta of Ubuntu 12.04 would be used to

    develop VisualBox. The Ubuntu platform is quick, and provides many libraries that

    can be used at the developers disposal.

    The Python programming language will also be the language of choice to develop

    the application. It is quick to prototype simple (and complicated) operations on the

    fly, and can provide good insight as to what is happening within the program.

    2.2 Sampling and Playing the Sound

    Since VisualBox is being developed within Ubuntu, it made sense to use a multime-

    dia framework that was installed by default and had a highly customisable pipeline.

    The obvious choice from a developers standpoint would be the GStreamer mul-

    timedia framework.

    It allows the modification of existing extensions to play any given audio file that

    GStreamer supports, as well as the ability to modify the decoder, mux, sinks, pads,

    and many other aspects of the pipeline. This would allow decoded sound samples

    to be acquired and played at the same time.

    2

  • 2.3. THE GRAPHICS LIBRARY 3

    2.3 The Graphics Library

    Once again, since Ubuntu is the operating system of choice for VisualBox, libraries

    that are compatible with the system are needed. Unfortunately, this automatically

    made Direct3D out of the question (since its a Windows/Xbox exclusive technol-

    ogy). There is however, the Simple DirectMedia Library (SDL) and OpenGL.

    SDL is an attempt to standardise the input and output between operating systems

    and platforms. This includes the standardisation of audio, keyboard and mouse

    inputs - however, since the visualiser does not need keyboard or mouse input and

    already has a medium in which to play music, SDL is made redundant. This leaves

    OpenGL along with its GLU and GLUT libraries to develop upon. This also enables

    a more in-depth learning experience into how OpenGL functions, rather than using

    a higher level of abstraction like SDL.

    2.4 Miscellaneous Python Libraries

    Libraries like SciPy/NumPy (algorithms and mathematics library) and threading

    libraries will be used to supplement the GStreamer (py-gst) and OpenGL (python-

    opengl) libraries. Use of these libraries will be self explanatory (except as mentioned)

    and can be considered trivial to explain.

  • Chapter 3

    Design of Visualisation

    3.1 Inspiration

    The attainment of an entertaining and aesthetically pleasing visualisation can be rel-

    atively difficult. Many people have different opinions on what is pleasing to them,

    and so a topic that generally appeals to most people would be chosen.

    Outer space is a mysterious place, and often unpredictable. In many cultures it

    can also be seen that people have a fascination with the void beyond their own little

    planet. So from a designers stand point, the sun and the stars and their mysterious

    secrets could be considered an interesting and appealing direction for a visualisation.

    That is why a star and black hole binary system can be used to appeal to a large

    audience.

    Figure 3.1: Example of Black Hole and Star Binary System

    4

  • 3.2. DESIGN DIRECTION 5

    3.2 Design Direction

    So as can be seen in Figure 3.1, there are solar flares, solar activity on the surface

    of the star, and the gravitation effect of the black hole slowly devouring the larger

    star. Unfortunately, in real time, one could imagine that the process isnt quite as

    dynamic in a macro level. However, the activity on the surface of the star is quite

    dynamic and could be considered a micro level activity. If one were to imagine that

    this micro level activity could be represented in a macro sense, a visualiser could

    effectively show dynamic properties of the music on the surface of the star. This

    could include many particles or shapes, skipping about and moving around the star

    according to the beat of the music.

    VisualBox though, will assign frequencies to each particle, and have the level of

    that frequency dictate a miniature solar flare. This gives the effect of an exploding

    star with music of infrequent, heavy beats. Hence, this would satisfy the require-

    ment that a subject could determine that the visualiser is based on the music and

    not just a pre-set loop. Since each particle (or solar flare) has its own frequency,

    there will be variance over the star for jumping particles.

    When a solar flare jumps high off of the star and close to the black hole, one

    would expect the black hole to trap the solar flare into a spinning orbit - until its

    impending doom. In the case of VisualBox, this is simple, in which if the particle

    is too close to the black hole, the particle will be trapped in orbit. Once in orbit,

    the particle retains its assigned frequency, but instead of the particle jumping (or

    flaring), the particle will speed up and slow down according to the sound level. After

    a randomly assigned time, the particle will decay and then finally reinitialise itself

    on the surface of the star if any particles are in an idle state, they will just roam

    across the surface of the star.

    To populate the black void of space, simple stars can be added - they have no

    other purpose other than to add depth and a sense of vastness. To show the depth

    of 3D to the viewer, the camera can also rotate about the scene, using the centre

    point of the star as a reference.

  • Chapter 4

    Implementation

    4.1 GStreamer: Playing and Decoding

    GStreamer is a pipeline based multimedia framework, this means that a developer

    can take a stream input (file, device, etc.) and pass the media data down the pipeline

    to be modified, decoded, played, saved to a file, or just sent into nothingness. This

    basic set of extensions and plug-ins can be rearranged to suit the developers inter-

    ests.

    Figure 4.1: Example of a simple audio pipeline in GStreamer

    GStreamer also has a very well implemented extension called Playbin2 in which

    utilises the GStreamer codecs already installed on the viewers system to play any

    Video or Audio file with little to no effort from the developer. That surely satisfies

    the requirement that VisualBox needs to play the music with the visualisation, but

    it surely doesnt link the decoded music to the visualisation.

    This is where the pipeline framework becomes extremely useful. The developer

    6

  • 4.2. NUMPY AND THE FAST FOURIER TRANSFORM 7

    only has to make a separate bin/pipeline, plug up the original Playbin2 (at the

    audio playing sink, or ALSA element), and redirect that pipeline to a custom audio

    playing sink and decoded output sink - similar to that shown in Figure 4.2.

    Figure 4.2: Layout of modified audio pipeline in GStreamer

    The audio sink will play the music, while the decoded sink will send the pull-buffer

    signal, allowing for the decoded Pulse Code Modulated (PCM) audio data to be

    saved. This PCM data is channel interleaved (Left Channel Sample, Right Channel

    Sample, Left, Right, etc.) and as specified in the GStreamer initialisation, has 16-

    bits worth of depth - as can be seen in Figure 4.4. This data is saved as an array of

    16-bit integers. For parallelisation, the GStreamer component of VisualBox will be

    threaded next to the OpenGL component.

    Figure 4.3: Example of a Decoded PCM Audio Data Sample

    4.2 NumPy and the Fast Fourier Transform

    Since the decoded data has been acquired, the PCM format needs to be converted

    into something workable. Preferably, VisualBox needs a set of frequencies and their

    corresponding levels. This is where the NumPy libraries come into play.

  • 8 CHAPTER 4. IMPLEMENTATION

    NumPy has an algorithm for Fast Fourier Transforms, an efficient use of Discrete

    Fourier Transforms. This method essentially converts a signal into its frequency and

    level components. In the case of VisualBox though, it does not need to know fre-

    quencies in Hertz or levels in decibels, it just needs to know the dominating regions

    of music, and visualise it.

    Figure 4.4: Real Example of the Mirrored FFT output from VisualBox

    So on each iteration of the OpenGL state, VisualBox will check the sampled data,

    deinterleave the left and right channels, apply the FFT upon this data, and save

    the output to two arrays of floats containing the levels in order of frequency. From

    this, the OpenGL visualiser can use the raw data to compute particle effects, and

    displacements according to ascending frequency. The only drawback to this method,

    is that not all samples are used. This is since there would be a 44100 Hz sample

    rate, split up by a framerate of around 20 frames per second, except the size of each

    buffered data is 1152 16-bit samples; hence, only half the buffered data would be

    captured, computed and utilised. However, the method of waiting for the OpenGL

  • 4.3. OPENGL, GLU AND GLUT 9

    state would be less CPU intensive, since youre only computing FFTs and deinter-

    leaving when OpenGL can actually display the next visual.

    4.3 OpenGL, GLU and GLUT

    The OpenGL component of VisualBox (the class OpenGL Main), is where all of the

    graphical components are brought together and visualised. The OpenGL class is

    initialised, and started from a thread using the start() function. This will ini-

    tialise GLUT and its display modes, display windows, window resizing functions,

    draw functions, and key press even handlers. It also follows on to the OpenGL

    initialisation component of VisualBox.

    Within the initialisation function (InitGL()), depth is set to check for any objects

    with a depth less than that of the stored depth (GL LESS), Polygons are configured

    to only render the outside (GL BACK) and only display the polygon edge lines inside

    (GL LINE) - this is to speed up rendering. To enable the depth checking between

    objects so that theyre displayed in order, GL DEPTH TEST is added, and finally the

    shade model is chosen to be GL SMOOTH to keep a smoother shade over objects. For

    final touches, OpenGL will be hinted to use the cleanest rendering techniques in-

    stead of the most efficient (glHint and GL NICEST), as VisualBox aims to provide

    an aesthetically pleasing experience.

    Figure 4.5: In order: Solar Texture, Star Sprite Texture, Particle Sprite Texture

    Next, the starmap is initialised into memory as static vertices that are randomly

    spawned over a large area of the scene. Then the particles that are spawned over the

    sun are initialised with random lifetimes, random inclinations, and random colours

    (tending to the red/orange spectrum) using the SphericalEmit() function. These

    particles are also assigned a unique frequency that theyll hold for the lifetime of

    the visualisation. Once this is complete, the textures can finally be initialised using

    InitTexturing(); where the solar texture is the only clamped texture, but all are

    loaded with alpha channels. These textures can be seen in Figure 4.5. Finally, the

    OpenGL draw sequence can start.

  • 10 CHAPTER 4. IMPLEMENTATION

    The main draw function initially starts with a buffer clear, and then proceeds to

    translate everything 3 units into the scene. This is effectively positioning the camera,

    ready to rotate the whole scene by half a degree every frame. Alpha tests are then

    enabled with a check passing only if the alpha is greater than 0.1. The blend func-

    tion is also set up to the recommended layout for transparency - this is mainly for

    the background stars and the particles as they pass over each other and solid objects.

    Once the precursing configurations have been set up, it is ready for the ARB Point

    Sprites to be loaded. These point sprites are essentially vertices in space, that are

    hardware rendered with a single texture instead of a pixel or quad. The quadratic

    is set up to dictate how distance should effect the size of the star or particle. The

    quadratic is pretty arbitrary in this case, it was chosen to give the best looking effect

    for distant stars. Maximum point size is then specified and the actual point size set

    to the maximum size. The only time that the sprites should fade out of the scene

    are when they are too small to be perceived (when their size is less than 3.0, from

    GL POINT FADE THRESHOLD SIZE ARB) or when the sprite is no longer on the screen.

    The point sprites are now set up, and the stars can finally be rendered into the

    scene. Textures are bound to the point sprites (in this case, the star sprite texture)

    and the pre-calculated star map vertices are sent via the glBegin(GL POINTS) func-

    tion. The vertices for the stars are kept static throughout the whole visualisation.

    The particles that are orbiting the Black Hole and the Sun however, are dynamic.

    Generation of the dynamic particles will start only when GStreamer has signalled

    that a buffer is ready to be pulled and is stored. Once this has happened, VisualBox

    will deinterleave the stereo channels that make up the single decoded audio buffer,

    and turn them into two arrays of data called lftch and rhtch. The data will then

    be passed through a fast fourier transform, and scaled down by 5 orders of magni-

    tude. The particle texture is then bound to the following point sprites, and the point

    sprite size is set to 2. When the point sprites for the particles are sent to the buffer,

    every even particle will use the left channels frequency, and every odd particle will

    use the right channels. This allows for balance and beat for various streams in music.

    Each particle will then be given a nudge (through randomNudge()), according to

    their assigned frequency (from initialisation). This nudge will simulate a solar flare

    and expel a particle by a displacement dictated by the level from the associated

    frequency. If the particle doesnt enter the radius of the Black Hole (half the dis-

    tance between the Black Hole and the Sun) and get trapped, the particle return to

  • 4.3. OPENGL, GLU AND GLUT 11

    a orbit position on the surface of the Sun and continue to randomly shift and roam

    along the surface. However, if the particle does get trapped in the grip of the Black

    Hole, the particle will assume its orbit around the Black Hole, and orbit around the

    black hole with a speed dictated once again by the sound level from the particles

    associated frequency. The particle, once trapped will then start the countdown until

    the count hits the designated lifetime, which will then remove the particle from the

    Black Hole, and re-designate it upon the Sun, using SphericalEmit() again.

    Finally, the Sun - which is a simple gluSphere - can be textured with the Solar

    Texture using texture coordinates generated from gluQuadricTexture, and dis-

    played at the [0, 0, 0] coordinate. It is considered a static object for the entirety of

    the visualisation. The Black Hole however, has no texture (since its supposed to

    be black), and randomly orbits the Sun at a randomly incremented radius and rota-

    tion. The new position for which the Black Hole moves to next is calculated by the

    blackHoleNudge() function. All of the calculations used in VisualBox use a polar

    coordinate system thats converted to the normal coordinate system to calculate the

    vertex positions around the Sun and the Black Hole. The Black Hole also uses this

    same system to orbit the Sun.

  • Chapter 5

    Results and Conclusions

    5.1 Result

    VisualBox provided an entertaining experience while listening to music. It success-

    fully utilised the Frequency and corresponding Level to calculate motion that could

    be clearly seen by the viewer. However, due to the method of sending the Points

    Sprites to the buffer, the performance of VisualBox was not quite as high as ex-

    pected on lower end graphics cards - in this case, the program was developed on an

    Intel Core i5-2557 with HD3000 graphics. This limited the number of Point Sprites

    on screen to about 1200 with 20 frames per second, instead of a potential 12000.

    Figure 5.1: Close up of the stars rendered in VisualBox

    Python was also considered a limiting factor, since its a runtime based language,

    and the amount of data passed about VisualBox was significant. Removing one of

    the audio channels from the process of interleaving and FFTs actually increased

    the framerate of VisualBox. It is considered that doing many array manipulation

    routines in a language like C would be far faster.

    12

  • 5.1. RESULT 13

    Figure 5.2: Close up of Sun and Black Hole with Particles

    As can be seen in Figure 5.2, addition of particles randomly orbiting the Sun and

    the Black Hole proved to be a well worth inclusion. The particles help to obscure

    the stretching of the Solar Texture (as can be seen in the centre of the Sun, if ob-

    served closely), and the particles around the Black Hole are used to show that there

    is actually a medium on an otherwise, almost black object - since they pass around

    and behind.

  • 14 CHAPTER 5. RESULTS AND CONCLUSIONS

    Finally, Figure 5.3 clearly shows that all of the elements have come together in a

    neat and entertaining fashion. The user does not need to have much of an input to

    the visualisation, and the camera will slowly revolve about the Sun, giving a sense

    of depth and dimension.

    Figure 5.3: A still of the whole scene from within VisualBox

    5.2 What Could Be Improved?

    As discussed in the last section, Python was considered a restriction in terms of

    performance. So a language like C - which is a compile time based language - would

    be used instead of a runtime based language like Python.

    OpenGL Vertex Buffer Objects (VBOs) should also have been used, instead of the

    slow process of looping through an array of vertices and sending them to the buffer

    using glBegin() and glEnd().

    A true gravitational physics, rather than a simple displacement algorithm using

    the frequency and levels. This would provide a much more dynamic and fluid ex-

    perience if this model had been used. Particles could actually follow paths, and use

    the sound level to accelerate the flow of particles to the Black Hole.

  • 5.3. CONCLUSIONS 15

    Add the addition of a beat checking algorithm to find the Beats Per Minute of

    a song. This however, is still a large topic of debate, as to which algorithm would

    be considered the best to use. Since different styles of music would surely have a

    different beat signature.

    With the combination of all of these improvements, a true, fluid particle motion,

    with possibly 10s or 100s of thousands of particles could be implemented. This

    could potentially look like the inspirational Black Hole and Sun Binary system as

    shown in Figure 3.1.

    5.3 Conclusions

    Overall, VisualBox was a success; albeit, with a few improvements to be made.

    Though, it was a 3-Dimensional visualiser that utilised the Frequency and Level

    from sound samples to display an aesthetic and entertaining visualisation synchro-

    nised with the beat of the music.

    The code though, can now act as a base to develop even more complicated visuali-

    sations. Ports can be made quite simply between different programming languages,

    and a possible plug-in could be created for various Media Players. VisualBox could

    be considered a prototype and learning experience for other programmers looking

    into the world of GStreamer and OpenGL.

  • Appendix A

    Program listings

    Currently, VisualBox uses the Bazaar revisioning system and stores its code on

    Launchpad.

    Install Bazaar, Python GStreamer, Python OpenGL, and SciPy/NumPy using:

    sudo apt-get install python-gst0.10 python-opengl python-numpy

    And get the source code using:

    bzr clone lp:alex-stevens/+junk/VisualBox

    The code can also be viewed online at this address:

    http://code.launchpad.net/alex-stevens/+junk/VisualBox

    The revision that is referenced in this version of the document is revision 36.

    16