Polygon Rendering on a Stream Architecture John D. Owens, William J. Dally, Ujval J. Kapasi, Scott...

Polygon Rendering on a Polygon Rendering on a Stream ArchitectureStream Architecture

John D. Owens, William J. Dally, Ujval J. Kapasi, Scott Rixner, Peter Mattson, Ben Mowery

Concurrent VLSI Architecture Group

Computer Systems Laboratory

Stanford University

Today’s Best HardwareToday’s Best Hardware• Commercial hardware:

Fast Cheap Ubiquitous

• Flexibility limited

• OpenGL scenes: Programmable streams deliver comparable performance.

Frame from Quake 3 Arena, © id Software

Today’s Best SoftwareToday’s Best Software

• Today’s software solutions: Powerful and flexible Slow

• OpenGL scenes: Streams deliver 20x performance.

Frame from A Bug’s Life, © Pixar Animation Studios, 1998

The VisionThe Vision

•Performance of a special-purpose processor

•Programmability of a general-purpose processor

•“Real-Time Renderman”

OutlineOutline

• What is stream processing?

• The Imagine architecture

• Polygon rendering on a stream architecture

• Results

• Conclusions

Kernels and StreamsKernels and Streams

• A stream is a set of elements of an arbitrary datatype.

• A computational kernel operates on streams.

Kernel

Streams

Transform

Stream ProcessingStream Processing• All data is streams!

• 2 levels of programming: Stream-level code Kernel-level code

Transform

Shader

ZBuffer

Zcompare

ColorBuffer

z,color

offset

Media Apps and StreamsMedia Apps and Streams• Producer-consumer locality

• High arithmetic requirements

• Homogeneous computation Efficient control Data parallelism

… poor match for microprocessors

Transform

Shader

ZBuffer

Zcompare

ColorBuffer

z,color

offset

The Imagine ArchitectureThe Imagine Architecture

SDRAM SDRAM SDRAM SDRAM

Imagine Stream Processor

Streaming Memory System

StreamController

HostProcessor

NetworkInterfaceStream Register File

Bandwidth HierarchyBandwidth Hierarchy

4GB/s 32GB/s

FileALU Cluster

ALU Cluster

544GB/s

ALU Cluster

SIMD/VLIW Control

Peak BW:

Cluster OrganizationCluster Organization

From SRF

To SRF

+ + * * /

Cross Point

Local Register File

Imagine Stats & StatusImagine Stats & Status

• 0.59 cm2 CMOS chip 500 MHz

• Circuits/Logic: expected completion 9/15/00

• Tapeout: expected Q4/2000 Fab: TI GS30KA process

(0.15 m drawn)

SDRAM SDRAM SDRAM SDRAM

Imagine Stream Processor

Streaming Memory System

StreamController

HostProcessor

NetworkInterfaceStream Register File

Polygon Rendering OutlinePolygon Rendering Outline

• Overview of OpenGL pipeline

• How we map OpenGL into streams & kernels

• How stream operations are sequenced

• How kernels are mapped onto Imagine Use of stream recirculation Detail of 3 steps in the pipeline:

Matrix transformation Scan conversion Enforcing ordering in composite stage

OpenGL Pipeline OverviewOpenGL Pipeline Overview

Application

Geometry

Rasterization

Image Composite

OpenGL:

•Has state

•Requires immediate mode

•Respects ordering

Pipeline DetailPipeline Detail

Transform

GLShader

PrimitiveAssembly

Project

Geometry

Spanprep

Spangen

Spanrast

TextureLookup

Rasterization

Z Lookup

Zcompare

Compact

Color, ZWrite

Composite

Input Data

Sort /Merge

Pipeline Stream DatatypesPipeline Stream Datatypes

Transform

GLShader

PrimitiveAssembly

Project

Geometry

Spanprep

Spangen

Spanrast

TextureLookup

Rasterization

Z Lookup

Zcompare

Compact

Color, ZWrite

Composite

Sort /Mergevertices

triangles spans fragments

offsets

depths

Most data is floating point.

Stream Recirculation Stream Recirculation

Transform

Memory SRF Clusters

Shader

ZBuffer

Zcompare

ColorBuffer

z,color

offset

• Strip-mining

• Memory accesses: Initial load of vertices Lookup of color/z/texture Writeback of color/z

• All other data accesses are local to the SRF

Stream and Kernel FlowStream and Kernel Flow

xformprojectassemble

rasterize

zcompareZ loadZ storeColor store

Texture load

Vertex load for next batch

CLUSTERS MEM STR 0 MEM STR 1

Excerpt from ADVS-1 run

Mapping Xform to Imagine Mapping Xform to Imagine

FCluster

Cluster

Transform

Memory SRF Clusters

Cluster

Mapping Spanrast to ImagineMapping Spanrast to Imagine

Spanrast

Memory SRF Clusters

Enforcing orderingEnforcing ordering• General sort possible

But too expensive

• Hash much cheaper! Hash function: 12 bits

Low 6 bits of x, low 6 bits of y

Hash table: 212 entries 2 bits/entry 16 words/scratchpad/

cluster

• Compact: Enforces ordering constraint

Compact

Zcompare

Image CompositionImage Composition

Cluster

Memory SRF Clusters

Z Buffer

Zcompare

Offset,z,

z,color

offset

Color Buffer

BenchmarksBenchmarks• ADVS-1: 62k vertices as

point-sampled polygons (SPECviewperf 6.1.1 Advanced Visualizer)

• ADVS-8: mipmapped version of ADVS-1

• Sphere: 82k lit, Gouraud-shaded triangles; 3 positional lights

• Fill: 20k mipmapped 25-pixel triangles

Sphere

Experimental setupExperimental setup

• Comparison systems: Microsoft opengl32.dll (sustained) NVIDIA Quadro (sustained) NVIDIA Quadro (peak)

• Test system: 450 MHz PIII Xeon, NT 4.0

• For comparison: Low overhead trace player (no appn. overhead) Average over 100s of frames (no startup costs) Disabled vsync

Results SummaryResults Summary

Software(opengl32.dll)

Imaginesustained

NVIDIAsustained

NVIDIA peak

advs-1

sphere

advs-8

Stream-level PerformanceStream-level Performance

• Computation, not memory, bound Highest memory system

occupancy: 58.7%

• Cluster occupancy: 94.3% - 98.8% Reuse

• 5.6 GOPS on Sphere

CLUSTERS MEM STR 0 MEM STR 1

Imagine Kernel BreakdownImagine Kernel Breakdown

project

assemblepoly

backfacecull

spanprep

spangen

spanrast

texfilter

sortcompactzcompare

geometry rasterization composite

• Majority of time is in rasterization ADVS-8 has 2.5x ops/frame than ADVS-1

ADVS-8ADVS-8

Future DirectionsFuture Directions

• Extend generality of OpenGL pipeline Add more complex scenes

• Programmable shading and lighting Straightforward to add per-vertex/per-fragment ops Eliminate multipass Goal: “Toolbox” of flexible elements

• Non-polygon rendering: raytracing, IBR, …

• Scalability: multi-Imagine implementations

ConclusionsConclusions

• Streams: Powerful primitive

• Stream architectures: Enable high performance

• Flexibility of general-purpose processor 20x better frame rates than commercial software

• Performance of special-purpose processor Comparable frame rates to commercial hardware

AcknowledgementsAcknowledgements• DARPA

• Industrial sponsors Texas Instruments Intel Corporation

• Matthew Eldridge and Kekoa Proudfoot

• Brian Towles and Brucek Khailany

• Anonymous reviewers for helpful comments

• The US Passport Office same-day turnaround!

Polygon Rendering on a Stream Architecture John D. Owens, William J. Dally, Ujval J. Kapasi, Scott...

Documents

Transcript of Polygon Rendering on a Stream Architecture John D. Owens, William J. Dally, Ujval J. Kapasi, Scott...

DSP Architectural Considerations for Optimal Baseband Processing Sridhar Rajagopal Scott Rixner Joseph R. Cavallaro Behnaam Aazhang Rice University, Houston,

Saturday, 23 rd June, 2012 ICAI Bhawan Baroda Pradip N. Kapasi Chartered Accountant1.

Overview of CUDA Libraries - Nvidia · Overview of CUDA Libraries Ujval Kapasi November 16, 2011. CUDA library ecosystem spans many fields Math, Numerics, Statistics ... CUBLAS …

The Imagine Stream Processor Ujval J. Kapasi, William J. Dally, Scott Rixner, John D. Owens, and Brucek Khailany Presenter: Lu Hao.

Streaming Supercomputer Strawman Bill Dally, Jung-Ho Ahn, Mattan Erez, Ujval Kapasi, Tim Knight, Ben Serebrin April 15, 2002.

RiceNIC: A Reconfigurable and Programmable Gigabit Network Interface Card Jeff Shafer, Dr. Scott Rixner Rice Computer Architecture: .

Mba-II(Finance) Project Report-ujval Sonone

1 Scheduling I/O in Virtual Machine Monitors© 2008 Diego Ongaro Scheduling I/O in Virtual Machine Monitors Diego Ongaro, Alan L. Cox, and Scott Rixner.

Mushtaq Kapasi, ICMA Bangkok, 5 August 2019 · environmental sustainability. The following section summarises Starbucks sustainability bond framework, including the use of proceeds,

COMP/ELEC 525 Advanced Microprocessor …courses/ee8207/lecture01.pdfAdvanced Microprocessor Architecture January 13, 2004 Prof. Scott Rixner Duncan Hall 3028 rixner@rice.edu Scott

Towards Borivali Presenting Ujval. Built On Trust.

PRACTICE DEVELOPMENT STRATEGIES By CA Nina P. Kapasi Organised by Women Members Empowerment Committee of ICAI 18 th April,2015 Hotel Aurora Towers, Pune.

The Imagine Stream Processor Concurrent VLSI Architecture Group Stanford University Computer Systems Laboratory Stanford, CA 94305 Scott Rixner February.

IMPACT OF DEMONETIZATION IN INDIA1 doc/2019/IJRSS_JANUARY2019/IJMRA-14939.… · IMPACT OF DEMONETIZATION IN INDIA1 SAROJIT KAPASI* BRIEF SUMMARY Demonetization is a generations‟

MPEG2 Video Encoding on Imagine November 16, 2000 Scott Rixner.

Scanned by CamScanner · Trainers Profile. Ujval is I IR Of institute. Ile Ile is one ot. the . ... Our Gratitude goes towards the department and the trainer. Special thanks to the

SURVEY SEARCH & SEIZURE - INCOME TAX ACT · PDF fileSURVEY SEARCH & SEIZURE - INCOME TAX ACT Pradip N. Kapasi Chartered Accountant FRIDAY, 19TH AUGUST, 2016 SIRC OF ICAI HOTEL TAJ

Disaster Recovery Guide 11g Release 1 (11.1.1) · 2014. 10. 26. · Pushkar Kapasi, Philip Kuhn, Ratheesh Pai, Suresh Mali, Tom Barnes, Vinay Shukla, Shilpa Shree, Prasad Vedurumudi,

PRECEDENTIAL UNITED STATES COURT OF APPEALS No. 18 … · Joseph K. Jones [ARGUED] Benjamin J. Wolf . Jones Wolf & Kapasi . 375 Passaic Avenue . Suite 100 . Fairfield, NJ 07004 .

ISSUES IN BUSINESS INCOME - WIRC WIRC...ISSUES IN BUSINESS INCOME Monday , 26 th December , 2011 K.C. College Auditorium WIRC of ICAI Pradip N. Kapasi Chartered Accountant 1 Shares