The Integration Imperative Redrawing the Lines to Invigorate Research and Spark Applications
Spark and C Integration
-
Upload
moshe-kaplan -
Category
Technology
-
view
28 -
download
0
Transcript of Spark and C Integration
Big Data, Spark and Python:C/C++ Integration
Moshe [email protected]
© All rights reserved: Moshe Kaplanhttp://top-performance.blogspot.com
C++ Integration using PIPE
Compile Code:/usr/bin/gcc -o /tmp/simple /tmp/simple.c
Copy Executable to central location (S3)
Distribute code to all nodes
Use pipe for integration
Clean the old code
2
>>> names = sc.parallelize(["Don", "Betty", "Sally"])>>> piped = names.pipe("/tmp/simple")>>> piped.collect()
© All rights reserved: Moshe Kaplanhttp://top-performance.blogspot.com
General Concept: PySpark Internals
3
© All rights reserved: Moshe Kaplanhttp://top-performance.blogspot.com
import osimport shutil
num_worker_nodes = 1
def copyFile(filepath):shutil.copyfile("/dbfs%s" % filepath, filepath)os.system("chmod u+x %s" % filepath)
sc.parallelize(range(0, 2 * (1 + num_worker_nodes))).map(lambda s: copyFile("/tmp/simple")).count()
C++: Copy code to nodes
4
© All rights reserved: Moshe Kaplanhttp://top-performance.blogspot.com
Thank You !
Moshe [email protected]