Parallelizing Spatial Data Mining Algorithms: A case study with Multiscale and Multigranular...
-
Upload
blake-horton -
Category
Documents
-
view
220 -
download
2
Transcript of Parallelizing Spatial Data Mining Algorithms: A case study with Multiscale and Multigranular...
![Page 1: Parallelizing Spatial Data Mining Algorithms: A case study with Multiscale and Multigranular Classification PGAS 2006 Vijay Gandhi, Mete Celik, Shashi.](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649d1b5503460f949f10ed/html5/thumbnails/1.jpg)
Parallelizing Spatial Data Mining Algorithms:A case study with Multiscale and
Multigranular Classification
PGAS 2006Vijay Gandhi, Mete Celik, Shashi Shekhar
Army High Performance Computing and Research Center (AHPCRC)
University of Minnesota
![Page 2: Parallelizing Spatial Data Mining Algorithms: A case study with Multiscale and Multigranular Classification PGAS 2006 Vijay Gandhi, Mete Celik, Shashi.](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649d1b5503460f949f10ed/html5/thumbnails/2.jpg)
Overview
PGAS Relevance, Application Domain
Problem Definition Approach Experimental Results Conclusion & Future Work
![Page 3: Parallelizing Spatial Data Mining Algorithms: A case study with Multiscale and Multigranular Classification PGAS 2006 Vijay Gandhi, Mete Celik, Shashi.](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649d1b5503460f949f10ed/html5/thumbnails/3.jpg)
PGAS Relevance
How effective is UPC in parallelizing spatial applications?
How effective is UPC in improving productivity of researchers in spatial domain?
![Page 4: Parallelizing Spatial Data Mining Algorithms: A case study with Multiscale and Multigranular Classification PGAS 2006 Vijay Gandhi, Mete Celik, Shashi.](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649d1b5503460f949f10ed/html5/thumbnails/4.jpg)
Spatial Applications: An Example
Multiscale Multigranular Image Classification
Input Output Images at Multiple Scales
![Page 5: Parallelizing Spatial Data Mining Algorithms: A case study with Multiscale and Multigranular Classification PGAS 2006 Vijay Gandhi, Mete Celik, Shashi.](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649d1b5503460f949f10ed/html5/thumbnails/5.jpg)
Model
= observations = a classification model = log-likelihood (Quality Measure) of M = Penalty function
Calculation of log-likelihood of Uses Expectation Maximization Computationally Expensive
7 hours of Computation time for an input image of size512 x 512 pixels with 4 Classes
MSMG Classification - Formulation
})(2)|({maxargˆ MpenMxlMM
xM
)|( Mxlpen
M
![Page 6: Parallelizing Spatial Data Mining Algorithms: A case study with Multiscale and Multigranular Classification PGAS 2006 Vijay Gandhi, Mete Celik, Shashi.](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649d1b5503460f949f10ed/html5/thumbnails/6.jpg)
Spatial Application: Multiscale Multigranular Image Classification Applications
Land-cover change Analysis Environmental Assessment Agricultural Monitoring
Challenges Expensive computation of Quality Measure i.e.
likelihood Large amount of data Many dimensions
![Page 7: Parallelizing Spatial Data Mining Algorithms: A case study with Multiscale and Multigranular Classification PGAS 2006 Vijay Gandhi, Mete Celik, Shashi.](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649d1b5503460f949f10ed/html5/thumbnails/7.jpg)
Pseudo-code : Serial Version 1. Initialize parameters. 2. for each Class 3. for each Spatial Scale 4. for each Quad 5. Calculate Quality Measure 8. end for Quad 9. end for Spatial Scale 10. end for Class 11. Post-processing
Q? What are the options for parallelization?
![Page 8: Parallelizing Spatial Data Mining Algorithms: A case study with Multiscale and Multigranular Classification PGAS 2006 Vijay Gandhi, Mete Celik, Shashi.](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649d1b5503460f949f10ed/html5/thumbnails/8.jpg)
Parallelization – Problem Definition Given
Serial version of a Spatial Data Mining Algorithm Likelihood of each specific class at each pixel Class-hierarchy Maximum Spatial Scale
Find Parallel formulation of the algorithm
Objective Scalability e.g. Isoefficiency
Constraints Parallel Platform “UPC”
![Page 9: Parallelizing Spatial Data Mining Algorithms: A case study with Multiscale and Multigranular Classification PGAS 2006 Vijay Gandhi, Mete Celik, Shashi.](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649d1b5503460f949f10ed/html5/thumbnails/9.jpg)
Challenges in ParallelizationDescription of work
Compute Quality Measure for combinations of Class-label, Scale, Quad (Spatial Unit)
Challengesa) Variable workload across computations
of quality measureb) Many dimensions to parallelize
i.e. Class-label, Scale, Quadc) Dependency across scales
![Page 10: Parallelizing Spatial Data Mining Algorithms: A case study with Multiscale and Multigranular Classification PGAS 2006 Vijay Gandhi, Mete Celik, Shashi.](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649d1b5503460f949f10ed/html5/thumbnails/10.jpg)
Class-level Parallelization 1. Initialize parameters and memory 2. upc_forall Class 3. for each Spatial Scale 4. for each Quad 5. Calculate Quality Measure 8. end for Quad 9. end for Spatial Scale 10. end upc_forall Class 11. Post-processing
![Page 11: Parallelizing Spatial Data Mining Algorithms: A case study with Multiscale and Multigranular Classification PGAS 2006 Vijay Gandhi, Mete Celik, Shashi.](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649d1b5503460f949f10ed/html5/thumbnails/11.jpg)
Class-level Parallelization Disadvantages:
Workload distribution is uneven (Cost of Quality measure changes with Class) Number of parallel processors is restricted
to number of classes
Examples
![Page 12: Parallelizing Spatial Data Mining Algorithms: A case study with Multiscale and Multigranular Classification PGAS 2006 Vijay Gandhi, Mete Celik, Shashi.](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649d1b5503460f949f10ed/html5/thumbnails/12.jpg)
Quad-level Parallelization
1. Initialize parameters and memory 2. for each Spatial Scale 3. upc_forall Quad 4. for each Class 5. Calculate Quality Measure 6 end for Class 7. end upc_forall Quad 8. upc_barrier 9. end for Spatial Scale 11. Post-processing
![Page 13: Parallelizing Spatial Data Mining Algorithms: A case study with Multiscale and Multigranular Classification PGAS 2006 Vijay Gandhi, Mete Celik, Shashi.](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649d1b5503460f949f10ed/html5/thumbnails/13.jpg)
Quad-level Parallelization
Advantages: Workload distribution is more even Greater number of processors can be
used Number of Quads = f (Number of
pixels) Example
Input: 4 Classes, Scale of 6
Input Image Size
Number of Quads
64 x 64 98,304
128 x 128 393,216
512 x 512 6,291,456
1024 x 1024 25,165,824
![Page 14: Parallelizing Spatial Data Mining Algorithms: A case study with Multiscale and Multigranular Classification PGAS 2006 Vijay Gandhi, Mete Celik, Shashi.](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649d1b5503460f949f10ed/html5/thumbnails/14.jpg)
Experimental Design
Input 64 x 64 pixels image (Plymouth County, Massachusetts) 4 class labels (Everything, Woodland, Vegetated, Suburban)
Language UPC
Hardware Platform
Cray X1
Number of Processors
1-8
![Page 15: Parallelizing Spatial Data Mining Algorithms: A case study with Multiscale and Multigranular Classification PGAS 2006 Vijay Gandhi, Mete Celik, Shashi.](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649d1b5503460f949f10ed/html5/thumbnails/15.jpg)
Workload
Input class hierarchy Output Images at Multiple scales
Scale: 64 x 64 Scale: 2 x 2
![Page 16: Parallelizing Spatial Data Mining Algorithms: A case study with Multiscale and Multigranular Classification PGAS 2006 Vijay Gandhi, Mete Celik, Shashi.](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649d1b5503460f949f10ed/html5/thumbnails/16.jpg)
Effect of Number of Processors
Quad-level parallelization gives better speed-up Room for Speed-up for both approaches Q? Class-level << Quad-level. Why?
Speedup Efficiency Plot
0
0.2
0.4
0.6
0.8
1
1.2
1 2 4 8Number of Processors
E f
f i c
i e
n c
y
cy.
Class-level Quad-level
01234567
2 4 8Number of Processors
S p
e e
d u
p
u
Class-level Quad-level
![Page 17: Parallelizing Spatial Data Mining Algorithms: A case study with Multiscale and Multigranular Classification PGAS 2006 Vijay Gandhi, Mete Celik, Shashi.](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649d1b5503460f949f10ed/html5/thumbnails/17.jpg)
Workload Distribution
Quad-level parallelization provides better load-balance Probably because of large number of Quads (~100,000)
Fixed Parameter - Number of processors: 4
![Page 18: Parallelizing Spatial Data Mining Algorithms: A case study with Multiscale and Multigranular Classification PGAS 2006 Vijay Gandhi, Mete Celik, Shashi.](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649d1b5503460f949f10ed/html5/thumbnails/18.jpg)
Conclusions How effective is UPC in parallelizing
Spatial applications? Quad-level parallelization
Speed-up of 6.65 on 8 processors Large number of Quads (98,304)
Class-level parallelization Speed-ups are lower Smaller number of Classes (4)
![Page 19: Parallelizing Spatial Data Mining Algorithms: A case study with Multiscale and Multigranular Classification PGAS 2006 Vijay Gandhi, Mete Celik, Shashi.](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649d1b5503460f949f10ed/html5/thumbnails/19.jpg)
Conclusions How effective is UPC in improving
productivity of researches in spatial domain?
Coding effort was reduced 20 lines of new code in program with base size of 2000 lines 1 person-month
Analysis effort refocused Identify units of parallel work i.e. Quality Measure Identify dimensions to parallelize i.e. Quad, Class, Scale Selecting dimension(s) to parallelize
Dependency Analysis (Ruled out Scale) Number of Units (Larger the better) Load Balancing
6 person-month
![Page 20: Parallelizing Spatial Data Mining Algorithms: A case study with Multiscale and Multigranular Classification PGAS 2006 Vijay Gandhi, Mete Celik, Shashi.](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649d1b5503460f949f10ed/html5/thumbnails/20.jpg)
Future Work
Improve Efficiency Explore Dynamic Load Balancing Other parallel formulations
Spatial Databases / Spatial Data Mining Group AHPCRC
Richard Welsh, NCS University of Boston
Junchang Ju, Eric D. Kolaczyk, Sucharita Gopal
Acknowledgements
![Page 21: Parallelizing Spatial Data Mining Algorithms: A case study with Multiscale and Multigranular Classification PGAS 2006 Vijay Gandhi, Mete Celik, Shashi.](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649d1b5503460f949f10ed/html5/thumbnails/21.jpg)
References E. D. Kolaczyk, J. J., and G. S. Multiscale,
Multigranular Statistical Image Segmentation. Journal of the American Statistical Association, 100, 1358-1369, 2005.
Z. Kato, M. Berthod, and J. Zerubia. A hierarchical Markov random field model and multi-temperature annealing for parallel image classification. Graphical Models and Image Processing, 58(1):18–37, January 1996.
A. Y. Grama, A. Gupta, V. Kumar. Isoefficiency: Measuring the Scalability of Parallel Algorithms and Architectures. IEEE Parallel & Distributed Technology: Systems & Technology, 1, 12-21, 1993.