Flexible and Efficient Control of Data Transfers for Loosely Coupled Components Joe Shang-Chieh Wu ...

20
Flexible and Efficient Control of Data Transfers for Loosely Coupled Components Joe Shang-Chieh Wu http://meou.us Department of Computer Science University of Maryland, USA
  • date post

    15-Jan-2016
  • Category

    Documents

  • view

    215
  • download

    0

Transcript of Flexible and Efficient Control of Data Transfers for Loosely Coupled Components Joe Shang-Chieh Wu ...

Page 1: Flexible and Efficient Control of Data Transfers for Loosely Coupled Components Joe Shang-Chieh Wu  Department of Computer Science University.

Flexible and Efficient Control of Data Transfers for

Loosely Coupled Components

Joe Shang-Chieh Wuhttp://meou.us

Department of Computer ScienceUniversity of Maryland, USA

Page 2: Flexible and Efficient Control of Data Transfers for Loosely Coupled Components Joe Shang-Chieh Wu  Department of Computer Science University.

What & How

• Obtain more accurate results by coupling existing (parallel) physical simulation components

• Different time and space scales for data produced in shared or overlapped regions

• Runtime decisions for which time-stamped data objects should be exchanged

• Performance might be a concern

Page 3: Flexible and Efficient Control of Data Transfers for Loosely Coupled Components Joe Shang-Chieh Wu  Department of Computer Science University.

Roadmap

• Approximate Match [Grid 2004]

• Collective Buffering [IPDPS 2007]

• Distributed App Match + Eager Transfer [under submission]

• Conclusion

Page 4: Flexible and Efficient Control of Data Transfers for Loosely Coupled Components Joe Shang-Chieh Wu  Department of Computer Science University.

Matching is OUTSIDE components

• Separate matching (coupling) information from the participating componentsMaintainability – Components can be

developed/upgraded individuallyFlexibility – Change participants/components

easilyFunctionality – Support variable-sized time

interval numerical algorithms or visualizations

Page 5: Flexible and Efficient Control of Data Transfers for Loosely Coupled Components Joe Shang-Chieh Wu  Department of Computer Science University.

Distributed Array Transfer Library

Basic Operation

Runtime-based Approximate Match Library

Importer component

Request Array for T = 2.5

Matched Array for T = 3

ApproximateMatch

Exporter component

T=4

T=3

T=2

Exported Distributed

Array

ImportedDistributed

Array

Arrays are distributed among multiple processes

T=1

Page 6: Flexible and Efficient Control of Data Transfers for Loosely Coupled Components Joe Shang-Chieh Wu  Department of Computer Science University.

Separate codes from matching

define region R1define region R4define region R5...Do t = 1, N, Step0 ... // computation jobs export(R1,t) export(R4,t) export(R5,t)EndDo

define region R2...Do t = 1, M, Step1 import(R2,t) ... // computation jobsEndDo

Importer App1

Exporter App0 Configuration file#App0 cluster0 /bin/App0 2 ...App1 cluster1 /bin/App1 4 ...App2 cluster2 /bin/App2 16 ...App4 cluster4 /bin/App4 4#App0.R1 App4.R0 REGL 0.05App0.R1 App2.R0 REG 0.1App0.R4 App1.R2 REGU 0.5#

Connection-Wise Approximate

Match

Policy Precision

Find t’ in App0, s.t. (a) t <= t’ <= t + 0.5 (b) minimize t’ – t

Source

Sink

Page 7: Flexible and Efficient Control of Data Transfers for Loosely Coupled Components Joe Shang-Chieh Wu  Department of Computer Science University.

• Execution time is composed of Computation time (Tcomp)

Buffering time (Tbuf)

Matched data transfer time (Ttran)

• Tbuf matters when exporter components (data sources) run more slowly

• Ttran matters when import components (data sinks) run more slowly

Dissection of Execution Time

Page 8: Flexible and Efficient Control of Data Transfers for Loosely Coupled Components Joe Shang-Chieh Wu  Department of Computer Science University.

Collective Buffering (when exporters run more slowly)

• Fastest export process sends runtime match results to slower processes in the same program

• Unnecessary memory copies can be avoided in slower processes

• Optimal State: only required exported data are buffered

Page 9: Flexible and Efficient Control of Data Transfers for Loosely Coupled Components Joe Shang-Chieh Wu  Department of Computer Science University.

Collective Buffering Result

Data Exporting Time for the Slowest Process

Copy All

CopySome Only Copy

Required

Optimal State

Page 10: Flexible and Efficient Control of Data Transfers for Loosely Coupled Components Joe Shang-Chieh Wu  Department of Computer Science University.

Eager Transfer + Distributed Match(when importer runs more slowly)

• Bandwidth and Latency both contribute matched data transfer time

• Eager transfer, transferring predicted data in advance, solves bandwidth issue

• Distributed approximate match, running on both exporter and importer, solves latency issue

Page 11: Flexible and Efficient Control of Data Transfers for Loosely Coupled Components Joe Shang-Chieh Wu  Department of Computer Science University.

Original

ET Only

ET+DM

Page 12: Flexible and Efficient Control of Data Transfers for Loosely Coupled Components Joe Shang-Chieh Wu  Department of Computer Science University.

Conclusion

• Runtime-based approximate match is a solution to couple different time scale components

• Performance can be improved – When exporter runs more slowly, avoid

unnecessary memory copies – When importer runs more slowly, transfer

predicted data and meta-data in advance

Page 13: Flexible and Efficient Control of Data Transfers for Loosely Coupled Components Joe Shang-Chieh Wu  Department of Computer Science University.

The End

Page 14: Flexible and Efficient Control of Data Transfers for Loosely Coupled Components Joe Shang-Chieh Wu  Department of Computer Science University.

Questions ?(http://meou.us)

Page 15: Flexible and Efficient Control of Data Transfers for Loosely Coupled Components Joe Shang-Chieh Wu  Department of Computer Science University.

Distributed Array Transfer Library

Basic Operation

Runtime-based Approximate Match Library

Importer component

Request Array for T = 2.5

Matched Array for T = 3

ApproximateMatch

Exporter component

T=4

T=3

T=2

Exported Distributed

Array

ImportedDistributed

Array

Arrays are distributed among multiple processes

T=1

Page 16: Flexible and Efficient Control of Data Transfers for Loosely Coupled Components Joe Shang-Chieh Wu  Department of Computer Science University.

On-Demand Approach

• Import Component Makes Request

• Perform Approx Match on Export Component, and then Transfer Matched Data

• Need Data Transfer Time (T3 – T2) and 2 one-way delays (T2

– T1)

Page 17: Flexible and Efficient Control of Data Transfers for Loosely Coupled Components Joe Shang-Chieh Wu  Department of Computer Science University.

Eager Transfer Only

• Get permission to push predicted data

• Transfer predicted data in advance

• Import component makes request

• Perform approx match on export component

• Need 2 one-way delays ( T16 – T15)

Page 18: Flexible and Efficient Control of Data Transfers for Loosely Coupled Components Joe Shang-Chieh Wu  Department of Computer Science University.

Eager Transfer With Distributed Match• …• Transfer predicted

data + meta-data in advance

• Import component makes request becomes local operations

• Local operation time T26 – T25 is needed, independent to one-way delay

Page 19: Flexible and Efficient Control of Data Transfers for Loosely Coupled Components Joe Shang-Chieh Wu  Department of Computer Science University.

All Together

Page 20: Flexible and Efficient Control of Data Transfers for Loosely Coupled Components Joe Shang-Chieh Wu  Department of Computer Science University.

Supported matching policies

<importer request, exporter matched, desired precision> = <x, f(x), p>

• LUB minimum f(x) with f(x) ≥ x• GLB maximum f(x) with f(x) ≤ x• REG f(x) minimizes |f(x)-x| with |f(x)-x| ≤ p• REGU f(x) minimizes f(x)-x with 0 ≤ f(x)-x ≤ p• REGL f(x) minimizes x-f(x) with 0 ≤ x-f(x) ≤ p• FASTR any f(x) with |f(x)-x| ≤ p• FASTU any f(x) with 0 ≤ f(x)-x ≤ p• FASTL any f(x) with 0 ≤ x-f(x) ≤ p