USB Performance Analysis of Bulk Traffic

download USB Performance Analysis of Bulk Traffic

of 50

Transcript of USB Performance Analysis of Bulk Traffic

  • 8/10/2019 USB Performance Analysis of Bulk Traffic

    1/50

    PlatformA

    rchitectureLab

    USB Performance Analysis

    of Bulk Traffic

    Brian [email protected]

  • 8/10/2019 USB Performance Analysis of Bulk Traffic

    2/50

    P

    latformA

    rchitectureLab

    2

    Introduction

    Bulk TrafficDesigned for reliable, highly variable data

    transfer

    No guarantees are made in the specification

    for throughputIs scheduled last after ISOC, Interrupt, and

    Control

    Throughput is dependant on many factors

  • 8/10/2019 USB Performance Analysis of Bulk Traffic

    3/50

    P

    latformA

    rchitectureLab

    3

    Introduction

    We will look at Bulk Throughput from thefollowing aspectsDistribution of Throughput for Various Packet Sizes and

    Endpoints

    Low Bandwidth Performance

    Small Endpoint PerformanceNak Performance

    CPU Utilization

    PCI bus Utilization

  • 8/10/2019 USB Performance Analysis of Bulk Traffic

    4/50

    P

    latformA

    rchitectureLab

    4

    Test Environment -- Hardware

    PII 233 (8522px) with 512 Bytes CacheAtlanta Motherboard with 440LX (PIX

    4A) Chipset

    32 Meg MemorySymbios OHCI Controller (for OHCI

    Measurements)

    Intel Lava Card as Test Device

  • 8/10/2019 USB Performance Analysis of Bulk Traffic

    5/50

    P

    latformA

    rchitectureLab

    5

    Test Environment -- Software

    Custom Driver and ApplicationTest Started by IOCTL

    IOCTL allocates static memory

    structures, submits IRP to USBDCompletion routine resubmits next

    buffer

    All processing done at ring 0,

    IRQL_DISPATCH

  • 8/10/2019 USB Performance Analysis of Bulk Traffic

    6/50

    P

    latformA

    rchitectureLab

    6

    Terminology

    A Packet is a Single Packet of Data on the Bus. It isdetermined by Max Packet Size of the Device Valid numbers are 8, 16, 32, 64

    A Buffer is the amount of data sent to USBD in a

    Single IRP. In this presentation buffers range from 8 Bytes to 64K Bytes

    Unless otherwise specified, Most Data Taken at 64

    Byte Max Packet Size, 15 Endpoints Configured in the

    System

  • 8/10/2019 USB Performance Analysis of Bulk Traffic

    7/50P

    latformA

    rchitectureLab

    7

    Host Controller Operation (UHCI)

  • 8/10/2019 USB Performance Analysis of Bulk Traffic

    8/50P

    latformA

    rchitectureLab

    8

    15 14 13 12 11 10 9 8 7 6 5 4 3 2 1

    8

    32

    128

    5122048

    8192

    32767

    0

    200000

    400000

    600000

    800000

    1000000

    1200000

    Throughpu

    t

    (BytesperSec

    ond)

    Number of Endpoints

    Buffer Size

    (Bytes)

    Total Throughput on All End Points v.s. Buffer Size for MultipleEndpoints

    (UHCI)Single Endpoint Throughput

    Flat Throughput @

    512 and 1024 Byte

    Buffers

    Oscillations @ 256, 512

    ByteBuffers

    Small Buffer Throughput

  • 8/10/2019 USB Performance Analysis of Bulk Traffic

    9/50P

    latformA

    rchitectureLab

    9

    Small Buffer Throughput

    For Buffer Sizes < Max Packet Size

    Host Controller sends 1 Buffer per Frame

    No Ability to Look Ahead and Schedule

    Another IRP Even Though Time Remains in

    the Frame

    Why is this?

  • 8/10/2019 USB Performance Analysis of Bulk Traffic

    10/50P

    latformA

    rchitectureLab

    10

    Interrupt Delay

    Start of Frame

    Interrupt

    Unused Frame Software Latency

    Last Packet

    Buffer 'n'

    First Packet

    Buffer 'n+1'

  • 8/10/2019 USB Performance Analysis of Bulk Traffic

    11/50P

    latformA

    rchitectureLab

    11

    Single Endpoint Graph

    Flat Throughput @ 1024 and 512 Byte Graphs

    Single Ended Throughput for 64K Byte Buffers Below

    Theoretical Max of 1216000 Bytes per Second

    Both are explained by Looking at the Number of

    Packets per Frame

  • 8/10/2019 USB Performance Analysis of Bulk Traffic

    12/50P

    latformA

    rchitectureLab

    12

    Maximum Packets per Frame

    BufferSize

    MaximumBytes per

    Frame (15

    Packets @

    64 Bytes

    Per

    Packet)

    Number ofFrames to

    transfer

    bulk of data

    Numberof Bytes

    Left Over

    TotalNumber of

    Frames To

    Transfer

    Data

    MaximumExpected

    Throughput

    (Bytes per

    Second for

    Transfer

    Size)

    MeasuredThroughput

    (Bytes per

    Second)

    8 960 1 0 1 8000 807116 960 1 0 1 16000 16082

    32 960 1 0 1 32000 32293

    64 960 1 0 1 64000 64264

    128 960 1 0 1 128000 129186

    256 960 1 0 1 256000 255667

    512 960 1 0 1 512000 512017

    1024 960 1 64 2 512000 5155152048 960 2 128 3 682666 682803

    4096 960 4 256 5 819200 819200

    8192 960 8 512 9 910222 910131

    16384 960 17 64 18 910222 910404

    32768 960 34 128 35 936228 936072

    65536 960 68 256 69 949797 948087

  • 8/10/2019 USB Performance Analysis of Bulk Traffic

    13/50P

    latformA

    rchitectureLab

    13

    Throughput for Multiple Endpoints

    512 Byte Buffers

    0

    200000

    400000

    600000

    800000

    1000000

    1200000

    1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

    Number of Endpoints

    Thr

    oughput(BytesPerSecon

    d)

  • 8/10/2019 USB Performance Analysis of Bulk Traffic

    14/50P

    latformA

    rchitectur

    eLab

    14

    512 Byte Buffers 1 Endpoint

    8 Packets * 64 Bytes per Packet = 512,000 B/S 511986 Measured

    EndPoint

    InterDelay(Bits)

    P A C K E T N U M B E R EndTime(Bits)

    1 SOF 1000 0 1 2 3 4 5 6 7 8 5000

    8 Packets Total per Frame

  • 8/10/2019 USB Performance Analysis of Bulk Traffic

    15/50P

    latformA

    rchitectur

    eLab

    15

    512 Byte Buffers 2 Endpoints

    16 Packets * 64 Bytes per Packet = 1,024,000 B/S 1,022,067 B/S Measured

    Notice that Interrupt Delay is not a factor here!

    EndPoint

    InterDelay(Bits)

    P A C K E T N U M B E R EndingTime(Bits)

    2 SOF 5 7 0 1 2 3 4 5 6

    1 0 1 2 3 4 5 6 7 480

    16 Packets Total per Frame

  • 8/10/2019 USB Performance Analysis of Bulk Traffic

    16/50P

    latformA

    rchitectur

    eLab

    16

    512 Byte Buffer -- 3 Endpoints

    24 Packets * 64 Bytes / 2 Frames = 768,000 B/S 776,211 Measured

    For Frame N

    End

    Point

    Inter

    Delay

    (Bits)

    P A C K E T N U M B E R Ending

    Time

    3 S 0 1 2 3 4 5 554

    2 O 1000 0 1 2 3 4

    1 F 0 1 2 3

    15 Packets Total in This Frame

    For Frame N + 1

    EndPoint

    InterDelay

    (Bits)

    P A C K E T N U M B E R EndingTime

    3 S 6 7

    2 O 5 5 6 7

    1 F 4 5 6 7 5700

    9 Packets Total in This Frame

  • 8/10/2019 USB Performance Analysis of Bulk Traffic

    17/50P

    latformA

    rchitectur

    eLab

    17

    151413121110 9 8 7 6 5 4 3 2 1

    8

    64

    512

    4096

    32768

    0

    200000

    400000

    600000

    800000

    1000000

    1200000

    TotalThrough

    put

    (BytesperSec

    ond)

    Number of Endpoints

    Buffer Size

    (Bytes)

    Total Throughput on All Endpoints V.S. Buffer Size

    for Multiple Endpoints

    (OHCI)High End Throughput

    18 PPF VS 17 PPF

    Single Ended Throughput

    900,000 VS 950,000 B/S

    Flat Throughput @

    512 and 1024 B

    Buffers

    Small Buffer

    Throughput

    Oscillations @

    256 and 512 B

    buffers

  • 8/10/2019 USB Performance Analysis of Bulk Traffic

    18/50P

    latformA

    rchitectur

    eLab

    Minimal Endpoint Configuration

  • 8/10/2019 USB Performance Analysis of Bulk Traffic

    19/50

    P

    latformA

    rchitectur

    eLab

    19

    15 14 13 12 11 10 9 8 7 6 5 4 3 2 1

    8

    64

    512

    4096

    32768

    0

    200000

    400000

    600000

    8 0 0 0 0 0

    1000 000

    1200 000

    TotalThroughput

    (BytesperSec

    ond)

    Number of Endpoints

    Buffer Size

    (Bytes)

    Total Throughput on All Endpoints V.S. Buffer Size for

    Multiple Endpoints

    Minimal Endpoint Configuration

    (UHCI)Higher Single Endpoint

    Throughput 17 VS 15 PPF

  • 8/10/2019 USB Performance Analysis of Bulk Traffic

    20/50

    P

    latformA

    rchitectur

    eLab

    20

    Host Controller Operation (UHCI)

  • 8/10/2019 USB Performance Analysis of Bulk Traffic

    21/50

    P

    latformA

    rchitectur

    eLab

    21

    Throughput of a Single Endpoint in Single and Multiple Endpoint Configurations

    (UHCI)

    0

    200000

    400000

    600000

    800000

    1000000

    1200000

    8 16 32 64 128 256 512 1024 2048 4096 8192 16384 32768 65536

    Buffer Size (Bytes)

    Throughput(Bytes

    perSecond)

    Single

    Multiple

  • 8/10/2019 USB Performance Analysis of Bulk Traffic

    22/50

    P

    latformA

    rchitectur

    eLab

    22

    Results

    We are working with Microsoft to remove

    unused endpoints from the Host ControllerData Structures

  • 8/10/2019 USB Performance Analysis of Bulk Traffic

    23/50

    P

    latformA

    rchitectur

    eLab

    23

    15 14 13 12 11 10 9 8 7 6 5 4 3 2 1

    8

    32

    128

    512

    2048

    819232768

    0

    200000

    400000

    600000

    800000

    1000000

    1200000

    TotalThroughput

    (BytesperSeco

    nd)

    Number of Endpoints

    Buffer Size

    (Bytes)

    Total Throughput on All Endpoints V.S. Buffer Size for Multiple Endpoints

    Minimal Endpoint Configuration

    (OHCI)Higher Single Endpoint

    ThroughputMore Endpoints get 18

    Packets per Frame

  • 8/10/2019 USB Performance Analysis of Bulk Traffic

    24/50

    P

    latformA

    rchitectur

    eLab

    Distribution of Throughput across

    Endpoints

  • 8/10/2019 USB Performance Analysis of Bulk Traffic

    25/50

    P

    latformA

    rchitectur

    eLab

    25

    1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

    15

    12

    9

    6

    3

    0

    100000

    200000

    300000

    400000

    500000

    600000

    700000

    800000

    900000

    1000000

    Throughput

    (BytesPerSec)

    Endpoint Number

    Number of Endpoints

    Throughput by End Point V.S. Number of Endpoints

    (UHCI)

    64K Byte Buffers

  • 8/10/2019 USB Performance Analysis of Bulk Traffic

    26/50

    P

    latformA

    rchitectur

    eLab

    26

    Results

    We are working with Microsoft to get the Host

    Controller driver to start sending packets at the nextendpoint rather than starting over at the beginning of

    the frame.

  • 8/10/2019 USB Performance Analysis of Bulk Traffic

    27/50

    P

    latformA

    rchitectur

    eLab

    27

    1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

    15

    12

    9

    6

    3

    0

    100000

    200000

    300000

    400000

    500000

    600000

    700000

    800000

    900000

    Throughput

    (Bytes

    PerSec)

    Endpoint Number

    Number of Endpoints

    Throughput by Endpoint V.S. Number of Endpoints

    64K Byte Buffers

    (OHCI)

  • 8/10/2019 USB Performance Analysis of Bulk Traffic

    28/50

    P

    latformA

    rchitectur

    eLab

    Limited Bandwidth Operation

  • 8/10/2019 USB Performance Analysis of Bulk Traffic

    29/50

    P

    latformA

    rchitectur

    eLab

    29

    1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

    15

    13

    11

    9

    7

    5

    3

    1

    0

    50000

    100000

    150000

    200000

    250000

    300000

    Throughput

    (Bytes

    perSeco

    nd)

    Number of Endpoints

    Endpoint Number

    Throughput by Endpoint V.S. Number of Endpoints

    1023 Bytes / Frame Isoc Traffic

    (UHCI)

  • 8/10/2019 USB Performance Analysis of Bulk Traffic

    30/50

    P

    latformA

    rchitectur

    eLab

    30

    1 2 3 4 5 6 7 8 9 10 11 12 13 14

    15

    15

    13

    11

    9

    7

    5

    3

    1

    0

    50000

    100000

    150000

    200000

    250000

    300000

    350000

    400000

    Throughput(Bytes

    PerSec)

    Number of Endpoints

    Endpoint Number

    Throughput by Endpoint V.S. Number of Endpoints768 Bytes / Frame Isoc Traffic

    (OHCI)

  • 8/10/2019 USB Performance Analysis of Bulk Traffic

    31/50

    P

    latformA

    rchitectur

    eLab

    Small Endpoint Performance

  • 8/10/2019 USB Performance Analysis of Bulk Traffic

    32/50

    P

    latformA

    rchitectur

    eLab

    32

    15 1413 1211 10 9 8 7 6 5 4 3 2 1

    8

    64

    512

    4096

    32768

    0

    50000

    100000

    150000

    200000

    250000

    300000

    350000

    400000

    450000

    TotalThroughput

    (BytesperSeco

    nd)

    Number of Endpoints

    Buffer Size

    (Bytes)

    Total Throughput on All End Points V.S. Buffer Size for

    Multiple Endpoints

    8 Byte Max Packet Size

    (UHCI)

  • 8/10/2019 USB Performance Analysis of Bulk Traffic

    33/50

    P

    latformA

    rchitectur

    eLab

    33

    15 14 13 12 11 10 9 8 7 6 5 4 3 2 1

    8

    32

    128

    512

    2048

    8192

    32768

    0

    50000

    100000

    150000

    200000

    250000

    300000

    350000

    400000

    450000

    500000

    TotalThroughp

    ut

    (BytesperSeco

    nd)

    Number of Endpoints

    Buffer Size

    (Bytes)

    Total Throughput on All End Points v.s. Buffer Size for Multiple Endpoints

    8 Byte Max Packet Size

    (OHCI)

  • 8/10/2019 USB Performance Analysis of Bulk Traffic

    34/50

    P

    latformA

    rchitectur

    eLab

    34

    Total Throughput for a Single Endpoint for Various Packet Sizes

    (OHCI)

    0

    100000

    200000

    300000

    400000

    500000

    600000

    700000

    800000

    900000

    1000000

    8 16 32 64 128 256 512 1024 2048 4096 8192 16384 32768 65536

    Buffer Size

    Throughput(BytesperSecond)

    8

    16

    32

    64

  • 8/10/2019 USB Performance Analysis of Bulk Traffic

    35/50

    P

    latformA

    rchitectur

    eLab

    35

    1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

    15

    13

    11

    9

    7

    5

    3

    1

    0

    100000

    200000

    300000

    400000

    500000

    600000

    700000

    800000

    900000

    1000000

    Throughput(BytesperSecond)

    Endpoint Number

    Number of Endpoints

    Throughput by Endpoint V.S. Number of Endpoints

    Mixed 64 and 8 Byte Endpoints

    (UHCI)

  • 8/10/2019 USB Performance Analysis of Bulk Traffic

    36/50

    P

    latformA

    rchitectur

    eLab

    36

    If you care about throughput.

    Use 64 byte Max Packet Size Endpoints

    Use Large Buffers

  • 8/10/2019 USB Performance Analysis of Bulk Traffic

    37/50

    P

    latformA

    rchitectur

    eLab

    Nak Performance

  • 8/10/2019 USB Performance Analysis of Bulk Traffic

    38/50

    P

    latformA

    rchitectur

    eLab

    38

    15 14 13 12 11 10 9 8 7 6 5 4 3 2 1

    8

    32

    128

    512

    2048

    8192

    32768

    0

    200000

    400000

    600000

    800000

    1000000

    1200000

    TotalThroughput

    Number of Endpoints

    Buffer Size

    (Bytes)

    Total Throughput on All Endpoints V.S. Buffer Size

    for Multiple Endpoints

    with 1 Endpoint NAKing 64 Bytes OUT

    (OHCI)

  • 8/10/2019 USB Performance Analysis of Bulk Traffic

    39/50

    P

    latformA

    rchitectur

    eLab

    39

    Single Endpoint Throughput

    With 64 Byte Endpoint NAKing on the Bus

    (OHCI)

    0

    100000

    200000

    300000

    400000

    500000

    600000

    700000

    800000

    900000

    1000000

    8 16 32 64 128

    256

    512

    1024

    2048

    4096

    8192

    1638

    4

    3276

    8

    6553

    6

    Buffer Size

    Throughput(Bytes

    perSecond)

    No NAK

    NAK

    45 % Drop in Total

    Throughput

  • 8/10/2019 USB Performance Analysis of Bulk Traffic

    40/50

    P

    latformA

    rchitectur

    eLab

    40

    15 14 13 12 11 10 9 8 7 6 5 4 3 2 1

    8

    32

    128

    512

    2048

    8192

    32768

    0

    200000

    400000

    600000

    800000

    1000000

    1200000

    TotalThroughpu

    t

    (BytesperSecon

    d)

    Number of Endpoints

    Buffer Size

    (Bytes)

    Total Throughput on All Endpoints V.S. Buffer Size for Multiple Endpoints

    14 Endpoints OUT, 1 Endpoint NAK IN

    (UHCI)

  • 8/10/2019 USB Performance Analysis of Bulk Traffic

    41/50

    P

    latformA

    rchitectur

    eLab

    41

    Single Endpoint Throughput

    One Endpoint NAKing IN

    0

    100000

    200000

    300000

    400000

    500000

    600000

    700000

    800000

    900000

    1000000

    8 16 32 64 128 256 512 1024 2048 4096 8192 16384 32768 65536

    Buffer Size

    Throughput(Byte

    s

    perSecond)

    NakNo NAK

  • 8/10/2019 USB Performance Analysis of Bulk Traffic

    42/50

    P

    latformA

    rchitectur

    eLab

    CPU Utilization

  • 8/10/2019 USB Performance Analysis of Bulk Traffic

    43/50

    P

    latformA

    rchitectur

    eLab

    43

    CPU Utilization

    Idle process incrementing a counter in main memoryDesigned to simulate a heavily CPU bound load

    Numbers indicate how much work the CPU could

    accomplish after servicing USB trafficHigher numbers are better

    Small buffers and large numbers of Endpoints take

    more overhead Software Stack Navigation

    Endpoint 0 is the Control -- No USB Traffic running

  • 8/10/2019 USB Performance Analysis of Bulk Traffic

    44/50

    P

    latformA

    rchitectur

    eLab

    44

    2048

    4096

    8192

    16384

    32768

    65536

    15

    13

    11

    9

    7

    5

    3

    1

    0

    2000000

    4000000

    6000000

    8000000

    10000000

    12000000

    IdleCount

    Buffer Size (Bytes)

    Number of

    Endpoints

    CPU Utilization

    (UHCI)

  • 8/10/2019 USB Performance Analysis of Bulk Traffic

    45/50

    P

    latformA

    rchitectureLab

    45

    2048 4096 8192 16384 32768 65536

    15

    13

    11

    9

    7

    5

    3

    1

    0

    2000000

    4000000

    6000000

    8000000

    10000000

    12000000

    CPU Utilization

    (OHCI)

  • 8/10/2019 USB Performance Analysis of Bulk Traffic

    46/50

    P

    latformA

    rchitectureLab

    PCI Utilization

  • 8/10/2019 USB Performance Analysis of Bulk Traffic

    47/50

    P

    latformA

    rchitectureLab

    47

    2048 4096 819216384 32768

    65536

    15

    13

    11

    9

    7

    5

    3

    1

    0

    5

    10

    15

    20

    25

    30

    35

    % U

    t i l i

    z a

    t i o n

    Buffer Size

    Number of Endpoints

    PCI Utilization

    (UHCI)

  • 8/10/2019 USB Performance Analysis of Bulk Traffic

    48/50

    P

    latformA

    rchitectureLab

    48

    PCI Utilization

    (UHCI) 15 Endpoint Configuration

    For low numbers of active endpoints, Host Controller

    must poll memory for each unused endpoint, causingrelatively high utilization.

    Removing unused endpoints will lower single

    endpoint PCI utilization for this configuration.

  • 8/10/2019 USB Performance Analysis of Bulk Traffic

    49/50

    P

    latformA

    rchitectureLab

    49

    Conclusions

    UHCI Host Controller Driver needs a few tweaksNeed to get Host Controller to start sending packets where it last

    left off rather than at endpoint 1.

    Needs to remove unused endpoints from the list

    Performance Recommendations

    Use 64 Byte Max Packet Size Endpoints Large Buffers are better than small buffers

    Reduce NAKd traffic

    Fast devices if possible

  • 8/10/2019 USB Performance Analysis of Bulk Traffic

    50/50

    latformA

    rchitectureLab

    50

    Future Research Topics

    Multiple IRPS per Pipe

    USB needs to control throughput to the slow device Small Endpoints arent good

    Small Buffers arent good

    NAKing isnt good