UCSD Site Report to the IAB Sheldon Brown Site Director Daniel Tracy CHMPR Programmer May 11, 2011...
-
Upload
syed-calvert -
Category
Documents
-
view
223 -
download
0
Transcript of UCSD Site Report to the IAB Sheldon Brown Site Director Daniel Tracy CHMPR Programmer May 11, 2011...
UCSD Site Report to the IABSheldon Brown
Site Director
Daniel TracyCHMPR Programmer
May 11, 2011Baltimore, MD
Overview
2
Ongoing Projects• Multi-User Extensible Virtual Worlds• Assets, Dynamics and Behavior Computation for Virtual Worlds and
Computer Games
Extending the CHMPR• Future Cinema – has revised focus on Augmented Reality• FRP and RapidMRI projects underway• REU – two new undergraduates involved in research• Complementary project with NSF EAGER grant
– “Identifying and Integrating Creative Patterns of User Behavior and
Staff• Sheldon Brown, Site Director
– Erik Hill, Programmer Analyst– Daniel Tracy, Programmer Analyst– Todd Margolis, Progammer Analyst– Kristen Kho, Programmer Analyst– Jeremy Douglass, Post-Doc Researcher– Vivek Ramavajjala, Graduate Student– Sam Kronik, Graduate Student– Robin Betz, Undergraduate Student– Bradley Ruoff, Undergradauate Student– Lourdes Guardiano-Durkin, Administrator
3
Projects
Ongoing projects:• Multi-user Extensible Virtual Worlds• Assets, Dynamics and Behavior Computation for Virtual
Worlds and Computer Games
Revised Project• Future Cinema as Augmented Reality
Affiliated Projects• Identifying and Integrating Creative Patterns of User
Behavior and Experience in Virtual Worlds
4
Products and Activities Last six months:
Virtual World Exhibitions• CSU Sacramento• UCSD 50th Anniversary Innovation Expo
Next Generation Cinema Presentations
• presentation by Justin Rattner, Intel• Featured on French/German TV: Souvenirs from Earth• Ukraine: Video Art in a Global Context Exhibition• Mexico Moving Forward• College Art Association New York• 3D movie featured at Seoul Korea Film Festival• Scalable City wins first prize in Sony Europe 3D movie competition
Lectures
• Varieties of Virtual World Experience via Multicore Computing at the Frontiers of Multicore Computing • Intel Labs Radio Show• Keynote talk for NEA/NSF Summit at RPI• EMPAC. I gave one of the keynote talks
Publications
Tracy D., Brown S. Combining Parallel & Incremental Techniques for Real-Time Physics in Large,Continuous Virtual Environments. Journal of Computing and Concurrancy – pending publication.
Website: http://chmpr.ucsd.edu
Multi-user Extensible Virtual Worlds
Status: Continuing
• Project Description: Multi-user Extensible Virtual Worlds
In order for virtual worlds to realize their potential across a number of areas of industry and research domains, along with serving as generally effective social forums, their expressive qualities need to be significantly improved upon. They require a considerable increase in the quantity and quality of entities and their interactions. This also entails a substantial increase in the sizes of virtual worlds, the number of users that are able to be supported, the variety of objects and behaviors and the simultaneity of entity interactions.
• Sponsors: IBM, Intel• Deliverables:
– Prototype Multi-user Virtual World ongoing development
6
Multi-user Extensible Virtual Worlds
Status: Continuing
• Sponsors: IBM, Intel• Deliverables:
– Prototype Multi-user Virtual World ongoing development• Major results
– Optimizing Client Server operations. Integrating Compute accelerators.
7
Scalable City: Massive Scale Virtual Worlds
• Massively multiplayer continuous world
• Hundreds of thousands of interactive objects
• Large aggregate bandwidth requirements
Challenges/IssuesOptimization, feature development, workable across heterogeneous clients
8
Goals• Scalability
– Support large environments, massively multi-player• Hybrid, Multi-platform server
– z10, x86, CellBE, Tesla accelerators• Performance
– Clients need to perform well on a range of desktop computer configurations
9
Increasing complexity of objects and interactions with increasing world size, users, numbers of objects and types of interactions.
Server services are distributed across cloud clusters, and redistributed across clients as performance or local work necessitates. Coherency with overall system is pursued, managed by centralized server. Virtual world components have dynamic tolerance levels for discoherency and latency.
compute accelerators for asset transformation, physics and behaviors.
Multiple 10gb interfaces to compute accelerators, storage clusters and compute cloud.
Server system keeps track of world state.
10
3 10gb interfaces to compute accelerators
Development Server Framework 5/2010
IBM Z10 mainframe computer at San Diego Supercomputer Center
2- IFL’s with 128mb Ram, zVM virtual OS manager with Linux guests
6 tb storage fast local storage – 15K disks
4 SR and 2 LR 10gb ethernet interfaces
4 QS20 blades – 8 Cell CPU’s
2 QS22 blades
- 4 Cell CPU’s
8HS22 blades - 16 Xeons – 96 cores
4 way Xeon Server – 32 core
1 10gb interfaces to internet
nVidia Tesla accelerator – 4 GPU’s on linux host, external dual pci connection.
Many Clients
11
How do you program a distributed heterogeneous system?
Server manages various virtual world processes. Use compute accelerators for compute intensive, parallelizable subsystems such as physics. Two phase approach:
Different systems for different underlying architectures return compatible results
Xeon blades running Scalable Physics Engine x86 optimizedGPU’s or novel architectures run Bullet engineDistribute heavy computational stagesCollision Detection on broad phase pair outputConstraint solving/Integration on contact groupsLong term approach : OpenCL plan
Develop physics system using algorithms well-suited to OpenCL parallelizationApplicable to both object collisions and deformationSame code base for different hardware – host or server side acceleratorsParallelization occurs throughout the physics pipelineLinearly scalable to availability of hardware resourcesSimilar approach for other aspects of asset computation
12
Multi-user Environment• Server Goals:
– 10,000 players on 1,000 cities• Performance Challenges
– Communication: 14.2 GB/sec to clients– Physics: 200,000 active objects– Rendering: x10 particle system complexity
13
Communication
• Fast & non-redundant data marshalling/archiving– “Player data-sharing” optimizations
• Generating assets deterministically on client– Removes need to communicate resources
• Reduced c/s synchronization frequency– Client-side interpolation
• Further tweaks to reduce bandwidth– Messages consolidated, compressed
• Adaptability to Client Hardware
14
Heterogeneous Client Support
• Client machine profiling– Processing power (CPU, GPU, # of cores)– Rendering performance– Networking latency/bandwidth
• Dynamic fidelity adjustment– Graphics effects
• Shadows, volumetric rendering, particle systems…
– Planned: Compute/synchronization trade-off
15
Future Work: Client-side Predictive Physics
• Interpolation smooths movement until server stalls– If server increases lag, there is nothing to interpolate to!
• Inject copy of Server functionality into Client– Performs same work on subset of data for prediction
• Server state may differ from prediction– Client interpolates what user sees during correction
• Allows us to decrease synchronization latency much further– Update frequency adjustable based on client process/network
16
Client
Server
Server
Tool of Interest:Growth Tracker
17
Stability
• Complex software subject to glitches• Scalable City designed to run continuously• Some bugs don’t manifest immediately
– Scalable City grew virtual memory footprint• Confirmed no memory leaks!• Tools exist to detect memory leaks quickly
18
Stability
• Higher-level “memory leak” problems:– Aggregate structures that persist across
processing cycles can grow unbounded– Not a detectable problem to the system!
• Scalable City uses massive number– STL & boost structures, strings, etc, etc.
• Data Structure Growth Tracking Tool– Override implementation of all aggregates!
19
Growth Tracker
• Growth tracker is a “singleton class”– One instance in each program (client/server)
• Every instance tracked by singleton class– Registers upon creation
• Every instance sampled periodically– Exponentially-increasing sample time
• Size of each instance tracked over time– Algorithm detects problem based upon history
20
Growth Tracker
• Practical implementation in large project– Cannot modify data structure usage code!
• Must be self-contained solution
– Must override all aggregates in all files• Required advanced C++ features
– Complex, but small implementation– Little data extractable: address, complete type
• Remains installed: one command turns on
21
Growth Tracker
• Useful software tool– Generalizes to any long-running program
• Requires running application for 1-2 days– No general way to detect unintended growth
• Provides more useful output than a crash upon allocation!– This kind of problem can be impractical to find
in large software without such a tool
22
Assets, Dynamics and Behavior Computation for Virtual Worlds and
Computer GamesStatus: Continuing
• Project Description:
Digital media environments are increasingly authored by users while they interact with them. This means that components such as their media assets and their behavior is under real-time control, rather then authored in advance. Doing so presents computational challenges to insure ongoing real-time performance, it also creates challenges in tracking assets across multiple types of instantiations.
• Sponsors: IBM, Intel, • Deliverables:
– Improve dynamics and asset computation across virtual worlds and digital cinema
23
Physical-based Simulation in the Massively Multi-player Scalable City Environment using OpenCL
Assets, Dynamics and Behavior Computation for Virtual Worlds and
Computer Games
24
Review of work to date
Assets, Dynamics and Behavior Computation for Virtual Worlds and
Computer Games
25
Scalable City:Physics Engine Evolution
• Open Dynamics Engine
• Open Source, Convenient, Good Reputation
• Augmented/Replaced subsystems over time
• Broad phase CD designed for large VR environ.
• Pipeline redesign for resting objects
• Multi-threaded subsystems for higher activity
• Only the core constraint solver remains ODE
26
Pipeline Redesign
27
• Overhead proportional to level of activity, rather than environment scale
• Novel broad phase and pipeline methods
Multithreaded Stages
28
• Thread-parallelism: limited scale
• Traditional physics methods allow limited parallelism
New Physics Engine
29
• New physics engine from scratch in C++
• Designed for massive parallelism
– SIMD & massively threaded (via OpenCL)
– Distributed Computing (MPI)
• Unique design for OpenCL physics– “Advanced Character Physics”, Thomas Jakobsen
Massively Parallel Physics
• Physics atoms are particles & constraints
• Objects represented as set of these atoms
• Rigid Body Dynamics behavior is “emergent”
• Soft bodies can also be modeled integrally
30
Advantages
• Massive, simple, evenly divided computations– Collision detection and constraints operate on
particles
– All constraints are solved independently
• Eliminates most OpenCL buffer transfers– No contact graph generation stage
– Broad phase collision detection integrated with collision constraint solving
31
Contact GraphColl. Det. IntegrationN-Body
IntegrationColl. Det.N-Body
Implementation Progress
Assets, Dynamics and Behavior Computation for Virtual Worlds and
Computer Games
32
Progress: Last Meeting
• What we had finished:Particle system with Verlet integration
Heightmap constraint w/ interpolation, friction, bounce
Stick constraints
Rigid body construction from particles + sticks
Multi-pass relaxation solver
Object transform extraction from particles
Dynamic object insertion & removal during simulation
33
Progress: Last Meeting
• What we were lacking: Support for multiple object topologies in OpenCL
Efficient OpenCL transfers for object migration
Parallel OpenCL broad phase collision detection
Integration into Scalable City incremental physics
MPI layer for distributed processing
34
Progress: Current
• Additional Progress:Support for multiple object topologies in OpenCL
Efficient OpenCL transfers for object migration
Parallel OpenCL broad phase collision detection
Integration into Scalable City incremental physics
MPI layer for distributed processing
35
Object Multi-topology Support
Assets, Dynamics and Behavior Computation for Virtual Worlds and
Computer Games
36
Object Multi-Topologies
37
• Requires indirection in accessing object info
– Single buffer per format (sticks, particles, forces)
– Mapping from object to ranges in each in CL
• Hole-tracking on host to re-use regions
– Supports dynamic object insertion-removal
– Exact fit replacement• Efficient for small # of discrete topologies
Object Multi-Topologies
38
Particles:positions, forces, mappings
Objects:track allocations, object identity
Sticks:rest length, mappings
StickAccum:calculation results particle x stick
Collision Detection
Collision Detection: filter self collisions
Average Constraints
Produce Constraints
Produce Constraints
OpenCL Host:Hole tracking for Particles, Sticks, StickAccum
OpenCL Memory
Host Memory
Efficient OpenCL Communication
Assets, Dynamics and Behavior Computation for Virtual Worlds and
Computer Games
39
OpenCL Communication
40
• Most communication has been eliminated
– All operations performed in CL
• Updating multiple buffers req’d for insertion
– Supports distributed & incremental systems
– Object insertion requires small blits to 9 buffers• Multiple insertions will be non-contiguous
• Extremely slow when CL is mapped to GPU devices!
Transfer Optimizations
41
• Must consolidate multi-buffer writes
– One buffer contains data & destination metadata
– A single, contiguous transfer to CL device
• Host-directed Transfers
– Multiple asynchronous clEnqueueCopyBuffer()
• Kernel-directed Transfers
– Kernel execution performs all transfers on card
Transfer Optimizations
42
CPU-Direct GPU-Direct CPU-Host GPU-Host CPU-Kernel GPU-Kernel0
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
0.09
0.1
InsAlloc InsWrite FlushPrep FlushWrite FlushRun
Both methods much preferable to naïve buffer updates!
Transfer Optimizations
43
GPU-Host GPU-Kernel
0
0.001
0.002
0.003
0.004
0.005
0.006
0.007
0.008
0.009
InsAlloc InsWrite FlushPrep FlushWrite FlushRun
Kernel-driven buffer updates 40% faster in test case
OpenCL Communication
44
• Future Kernel-Driven Optimizations
– Better load balancing• Better control of transfer size for each entry
– Lower space overhead• Reduce meta-data overhead
• Reduces time to transfer single buffer of updates
• Ordering entries by destination buffer is a good start
Parallel OpenCL Broad Phase Collision Detection
Assets, Dynamics and Behavior Computation for Virtual Worlds and
Computer Games
45
OpenCL Broad Phase
46
• Lack of OpenCL BP is now largest overhead
– Communication, Nonparallel execution
– 30-50% of execution time
• Learning from well-engineered examples
– nVidia: OpenCL Particle Collision Simulation• Hash grid: High Performance, Feature Poor
OpenCL Broad Phase
47
• Grid Limitation: Cell based on object size
– Scalable City uses vastly different object sizes• House pieces, cyclones, lot entities, gravity fields
• Not an “incremental algorithm”
– Almost impossible in OpenCL: give up
– Some acceleration from temporal coherence?• Sorting strategy
Parallel Sweep & Prune
48
• Utilizes intervals for some size variation
• Sort & implicitly subdivide along one axis
• Sort partial buffer along another axis
• Parallel second pass detects overlaps
Space Filling Curve Sort
49
• Modified Morton numbers provide– Spatial locality order in one sort pass
– Conservative interval calculation eliminates false negatives
Hash Grid with Queries
50
• Majority of objects have similar size
– House pieces: mapped to grid for n-body
• Medium size objects queried against range of grid cells in separate kernel
– Lots, cyclones collide with house pieces, but not with each other
– Unit of parallelism improved to Object/Cell pair
• Gravity fields done in different subsystem
OpenCL Broad Phase
51
• Overlap reporting challenge
– Reporting subset of n2 possible pairs efficiently
• Known solution: predict overlaps per object
– Potentially missed overlaps or multiple passes
– Inefficient storage includes gaps
• Does not apply to our use case!
OpenCL Broad Phase
52
• Our physics solves constraints individually
– No assembling or reporting of pairs necessary
– Kernel solves constraints as they are detected
– Pre-sorted or binned data can be re-used during iterative constraint solving
Narrow Ph.Broad Phase ResolutionCD Kernel:
Dynamic asset generation in the Scalable City environment
Assets, Dynamics and Behavior Computation for Virtual Worlds and
Computer Games
53
Dynamic Land Modification
• Constrained Delaunay Triangulation
54
Dynamic Land Modification
• Constrained Delaunay Triangulation– Uses
• Dynamic mesh patching • Fast mesh reduction of flat regions
– Optimizations• Geared to reduce copying with our internal data
structures• Parallelized across compute nodes.
55
Dynamic Land Modification
• Constrained Delaunay Triangulation– Divide & conquer approach, highly parallelizable– Multi-threaded version is significantly faster
56
500 1K 2K 4K 8K 10K 20K 30K 40K 50K 60K0s
5s
10s
15s
20s
25sMulti-threaded
Single-Threaded
Number of Points Triangulated
Tria
ngul
atio
n T
ime
Dynamic Land Modification
57
Dynamically Generated Avatars
• Exploring new sources and techniques for automatic avatar generation
• Current avatar made up of vehicles– 3D structure generated from several photographs of each vehicle
58
Dynamically Generated AvatarsHand Avatars
• Extract Hand from video using Foreground Object Detection• Map hand to 3D model• Animate individual fingers into walking motion
59
Future CinemaAs Augmented Reality
Status: Revised
• Sponsors: IBM, Intel • Potential Members – Sony and Qualcomm providing In-Kind support
• Deliverables:– Create New Approaches for Creating Augmented Reality.
• Major results– 3D 4K movie produced and exhibited– Wins First Prize from Sony Europe.
60
Future Cinema As Augmented RealityStatus: Revised
61
62
Future Cinema As Augmented RealityStatus: Revised
Architects have been “augmenting reality” for decades
“Spatial City”Yona Friedman - 1958Hand-drawn sketch over photograph
63
Geodesic Dome over NYCBuckminster Fuller - 1968Collage over aerial photo
Future Cinema As Augmented RealityStatus: Revised
64
Contemporary mobile devices can do this in real time
Future Cinema As Augmented RealityStatus: Revised
65
Future Cinema As Augmented RealityStatus: Revised
User-created augmentations inserted into a real scene
66
Future Cinema As Augmented RealityStatus: Revised
Taking AR off the Desktop...Into the City
building facades ascomputer-readable markers
67
Future Cinema As Augmented RealityStatus: Revised
Vision OnlySensors Only
Both, but not at the same time
Current Approaches Don’t Maximize the Devices’ Potential
a hybrid approach Is needed!
GPS doesn’t give pixel-perfect alignment
Vision is slow and
requires clear line-
of-sight
68
Future Cinema As Augmented RealityStatus: Revised
Sensor Fusion
Hybrid AR Is Inherently Multithreaded
GPS/SensorProcessing
3D GraphicsRendering
Network Communication
That’s just about everything a modern mobile device can do!
Computer Vision
69
Future Cinema As Augmented RealityStatus: Revised
70
Future Cinema As Augmented RealityStatus: Revised
71
Future Cinema As Augmented RealityStatus: Revised
Video:http://www.youtube.com/watch?v=hCVZ2TSFI-Y
User Behavior PatternsStatus: Continuing
• Project Description:
Analyze and predict user behavior in the virtual world to inform dynamic modifications to the environment to create a richer virtual experience.
• Research:
Focus on responding to observed correlations of behaviors with:– State of the virtual world– Recent and future in-world events– Visual appearance of the world (user view)– Previous patterns in user input
72
User Behavior PatternsSupport
• Complimentary grant support:
NSF EAGER (EArly Grants for Exploratory Research):
“Identifying and Integrating Creative Patterns of User Behavior and Experience in Virtual Worlds”
• Grant description:
A new interdisciplinary methodology for both the analysis of user’s experiences in virtual worlds and the design of such worlds. It combines ideas from games design, computer science, information visualization, new media art and media theory.
73
User Behavior PatternsSupport
• Grant description (continued):
If successful, game designers, HCI researchers, and games and media scholars will be able to analyze, visualize and interpret the dimensions of user experiences with interactive time-based cultural artifacts such as video games, animated interfaces, and interactive artworks which are not captured with current analytics techniques. At the same time, by incorporating the new analytics techniques in virtual world generation software, the project aims to advance the current research on how to create interfaces and simulations which analyze user performance and dynamically adapt based on the results of the analysis.
74
User Behavior PatternsBackground
• Present state of knowledge in virtual world analysis– network analysis (connectivity, load, latency)– econometrics on virtual economies– profiling player game play
• Primarily driven by game companies– during development (Microsoft Labs, Halo series)– for an ongoing MMOG (Blizzard, World of Warcraft)– over a game network service (Steam, Xbox Live)
75
User Behavior PatternsMethodology
• New methods: Cultural Analytics– “the use of computational methods for the analysis of patterns in
visual and interactive media.”– Data mining, knowledge exploration, and information
visualization as applied to cultural artifacts and experiences such as paintings, cartoons, or virtual worlds.
• Logging, visualizing, designing– Record events in the world and telemetry on the user– Visualize spatial, temporal, and narrative patterns– Explore mechanisms to dynamically accommodate behavior
patterns in virtual world design
76
User Behavior PatternsLogging
• Event logging– Server-side code hooks fire when an event occurs– Events logged as time-stamped “triples” (subject-verb-object)
• Object / user interactions (Player1 activates Object5)• World state changes
• User telemetry logging– Data is polled from client at set rate (1/sec) and logged on server
• User input (trackball direction, velocity)• User avatar position / orientation• User camera position / orientation / type
77
User Behavior PatternsEvent Logging
78
User Behavior PatternsEvent Logging
79
User Behavior Patterns2D projection of virtual world coordinates
80
Logging data of user positions
as tracked in abstract space
User Behavior Patterns2D and 3D projection
81
• Coordinate spaces for information visualization– Virtual world is computed and
rendered on a complex 3D surface
– 2D projects are important to visual understanding
– Example: 3D “box world” 2D “unfolded box” projection
User Behavior Patterns2D and 3D projection
82
• Coordinate spaces for information visualization– Automatically generated
interactive animations and timelines for user paths through the virtual world space
User Behavior Patterns2D and 3D projection
83
• Coordinate spaces for information visualization– Multiple path views or
overlays– Time-based trails with past-
present-future coloring– “Heatmap” density overlay to
indicate amount of time spent in each place
– Spectral coloring to passage of time during player path without animation
User Behavior Patterns2D and 3D projection
84
• Coordinate spaces for information visualization– Interactive viewer– Multiple information overlays– Dynamically transform from
one coordinate space to another
User Behavior PatternsIssues
• Challenge: exact replay of sessions from log data,or exact parallel playback in different visual modes– Client and network optimizations create special classes of
synced vs. non-synced world objects and events– Each client sees the same world, yet a different world
• Particle systems• Lag• Precision• Randomization
• Deducing implicit cause-effect relationships that aren’t modeled by the server– e.g. “The last user to touch it is the one that did it”
85