Spark and C Integration

5
Big Data, Spark and Python: C/C++ Integration Moshe Kaplan [email protected]

Transcript of Spark and C Integration

Page 1: Spark and C Integration

Big Data, Spark and Python:C/C++ Integration

Moshe [email protected]

Page 2: Spark and C Integration

© All rights reserved: Moshe Kaplanhttp://top-performance.blogspot.com

C++ Integration using PIPE

Compile Code:/usr/bin/gcc -o /tmp/simple /tmp/simple.c

Copy Executable to central location (S3)

Distribute code to all nodes

Use pipe for integration

Clean the old code

2

>>> names = sc.parallelize(["Don", "Betty", "Sally"])>>> piped = names.pipe("/tmp/simple")>>> piped.collect()

Page 3: Spark and C Integration

© All rights reserved: Moshe Kaplanhttp://top-performance.blogspot.com

General Concept: PySpark Internals

3

Page 4: Spark and C Integration

© All rights reserved: Moshe Kaplanhttp://top-performance.blogspot.com

import osimport shutil

num_worker_nodes = 1

def copyFile(filepath):shutil.copyfile("/dbfs%s" % filepath, filepath)os.system("chmod u+x %s" % filepath)

sc.parallelize(range(0, 2 * (1 + num_worker_nodes))).map(lambda s: copyFile("/tmp/simple")).count()

C++: Copy code to nodes

4

Page 5: Spark and C Integration

© All rights reserved: Moshe Kaplanhttp://top-performance.blogspot.com

Thank You !

Moshe [email protected]