Post on 11-Jan-2016
8th Annual CSIS Research Conference
1
Client Server Browsing of Sound Resources: Classification and Browsing
E. Brazil
Interaction Design Centre
University of Limerick
Ireland
8th Annual CSIS Research Conference
2
Introduction
? - how to classify sound resources and how
to provide an interface to browse these
resources.
! - provide a browsable sound database for
users via intranet / Internet environments
8th Annual CSIS Research Conference
Overview of Research Areas
• Sound Classification
• Sound Representation
• Sound Browsing
8th Annual CSIS Research Conference
Sound Classification
• Two levels of classification
• Course level– Distinguish whether Speech, Music,
Environmental, Silence or Other category
• Fine level– Use human perceptual features
8th Annual CSIS Research Conference
Coarse-level classification of audio (1)
– Audio signals are classified into basic types, including speech, music, several types of environmental sounds, and silence
– Take morphological and statistical analyses of short-time feature curves (energy function, average zero-crossing rate, fundamental frequency), as well as a rule-based heuristic classification procedure
8th Annual CSIS Research Conference
Coarse-level classification of audio (2)
• Short-time energy function– Short-time energy of audio signal reflects the
amplitude variations over time
• Short-time average zero-crossing rate
– ZCR is the number of times the signal passes
through zero in a given time interval
• Spectral Centroid
8th Annual CSIS Research Conference
Fine-level classification of audio
• Further classification will be conducted within each basic type:
– music: classify music played by different instruments, different types of music, singing, plain song
– speech: differentiate voices of man, woman, and child, speech with music background
– environmental sound: divide them into classes such as applause, bell ring, footstep, windstorm, laughter, bird’s sound, and so on
8th Annual CSIS Research Conference
Sound Representation
• Previous work has concentrated on– Visual star-field type display
• New novel visual representations– Visualisations on spheres (non-Euclidean
spaces)– Hyper tree– Excentric labeling
8th Annual CSIS Research Conference
Star-field Display
Virtual University - Uni. Vienna
8th Annual CSIS Research Conference
Visualisations on Spheres
H3: Laying OutLarge DirectedGraphs in 3D HyperbolicSpace - Munzer
8th Annual CSIS Research Conference
Hyper Tree
www.inxight.com
8th Annual CSIS Research Conference
Excentric Labeling
HCIL – Uni. Maryland
8th Annual CSIS Research Conference
Sound Browsing
• Iterative & Interactive Activity:– Opportunistic & Serendipitous
• Enable users’ to explore a data set
• External & internal properties of objects:– Context & Content
• Evaluate and revise understanding of relationships
8th Annual CSIS Research Conference
14
The Sonic Browser ApplicationAudio: Direct representation of tunes
(exploting the cocktailparty effect)
• Sounds are panned out in a stereo field controlled by the visual location of the tunes nearest to the cursor.
• The volume of the tunes playing concurrently is proportional to the visual distance between the objects and the cursor
8th Annual CSIS Research Conference
16
The Sonic Browser Application
8th Annual CSIS Research Conference
Client – Server Issues
• let the server do the mixing and spatialisation
• analysis and classification on server
• lightweight client - Java.
• different network topologies and protocols.– Latency issues– Use of a floating ‘Aura’
8th Annual CSIS Research Conference
Cue Points
• Use Cue Points as Marker Points– Mark a specific point or section of a sound
• Play only significant portion of sound while browsing
• Reduce time to identify sound by playing characteristic or significant part
• Found in many common sound file formats* Technical Report UL-IDC-01-02
8th Annual CSIS Research Conference
22
Application Platform: HW & OS
• Normal Multimedia PC – (Pentium II/III w. SB Live, etc)
• Server – MS Windows 98/2000
• Client– Any O/S with Java Runtime
8th Annual CSIS Research Conference
Conclusion
• Facilitate different visualisation tools, e.g. for non-Euclidean space.
• Address payment and copyright issues
• Investigate other file types, e.g. MPEG-7.
8th Annual CSIS Research Conference
References (1)
• Brazil, E. (2001). Cue Points: An Examination Of Common Sound File Formats. Limerick, University of Limerick.
• Fekete, J. D., Plaisant, C. (1999). Excentric Labeling: Dynamic Neighborhood Labeling for Data Visualization. Conference on Human factors in Computer Systems, New York, ACM.
• Fernström, M., Brazil, E. (2001). Sonic Browsing: An Auditory Tool For Multimedia Asset Management. International Conference on Auditory Display, Espoo, Finland.
• Ó Maidín, D. and M. Fernström (2000). The Best of Two Worlds: Retrieving and Browsing. COST-G6 Conference on Digital Audio Effects DAFx-00, Verona, Universita degli Studi Verona.
8th Annual CSIS Research Conference
References (2)
• Shneiderman, B. (1996). The eyes have it: A task by data type taxonomy for information visualizations. IEEE, Visual Languages, Boulder, CO, USA.
• Zhang, T., Kuo, C.C. (1998). Content-based Classification and Retrieval of Audio. SPIE's 43rd Annual Meeting - Conference on Advanced Signal Processing Algorithms, Architectures, and Implementations VIII, San Diego.
• Zhang, T., Kuo, C.C. (1998). Hierarchical System for Content-Based Audio Classification and Retrieval. SPIE's Conference on Multimedia Storage and Archiving Systems III, Boston.