Alexander Pavlenko, Java Software Engineer, DataArt.
-
Upload
alina-vilk -
Category
Education
-
view
123 -
download
4
Transcript of Alexander Pavlenko, Java Software Engineer, DataArt.
![Page 1: Alexander Pavlenko, Java Software Engineer, DataArt.](https://reader030.fdocuments.in/reader030/viewer/2022021503/5877f98b1a28ab91178b5513/html5/thumbnails/1.jpg)
Building Spark Сonnector for Ryft -hardware high-speed compute appliance
Aleksandr PavlenkoBig Data Software Engineer, [email protected]
![Page 2: Alexander Pavlenko, Java Software Engineer, DataArt.](https://reader030.fdocuments.in/reader030/viewer/2022021503/5877f98b1a28ab91178b5513/html5/thumbnails/2.jpg)
What is Apache Spark ?
![Page 3: Alexander Pavlenko, Java Software Engineer, DataArt.](https://reader030.fdocuments.in/reader030/viewer/2022021503/5877f98b1a28ab91178b5513/html5/thumbnails/3.jpg)
Ryft ONE - hardware producing Big Data
![Page 4: Alexander Pavlenko, Java Software Engineer, DataArt.](https://reader030.fdocuments.in/reader030/viewer/2022021503/5877f98b1a28ab91178b5513/html5/thumbnails/4.jpg)
Ryft Query Language
Query examples:
Exact Search: (RAW_TEXT CONTAINS "Some Text")
Edit Search: (RAW_TEXT CONTAINS FEDS("Some Text", DIST=2, ...))
Date Search: (RECORD.date CONTAINS DATE(MM/DD/YYYY <=
"04/05/2015"))
![Page 5: Alexander Pavlenko, Java Software Engineer, DataArt.](https://reader030.fdocuments.in/reader030/viewer/2022021503/5877f98b1a28ab91178b5513/html5/thumbnails/5.jpg)
Ryft REST Service
![Page 6: Alexander Pavlenko, Java Software Engineer, DataArt.](https://reader030.fdocuments.in/reader030/viewer/2022021503/5877f98b1a28ab91178b5513/html5/thumbnails/6.jpg)
Spark Ryft Connector
Use Cases:
● Financial services
● Customer visibility
● Call center records
● Security and defense
● e-Medical records
● Genomic research
● IoT sensor and devices
● Supply chain logistics
![Page 7: Alexander Pavlenko, Java Software Engineer, DataArt.](https://reader030.fdocuments.in/reader030/viewer/2022021503/5877f98b1a28ab91178b5513/html5/thumbnails/7.jpg)
Supercharging Spark with Ryft
*Benchmark comparisons against Apache Spark running on a cluster of AWS EC2 –
c3.8xlarge “Compute Optimized” 2U servers that require 1100 Watts each.http://www.ryft.com/products#performance-proof
![Page 8: Alexander Pavlenko, Java Software Engineer, DataArt.](https://reader030.fdocuments.in/reader030/viewer/2022021503/5877f98b1a28ab91178b5513/html5/thumbnails/8.jpg)
RDD - Resilient Distributed Dataset
abstract class RDD[T](...) {
@DeveloperApi
def compute(split: Partition, context: TaskContext): Iterator[T]
protected def getPartitions: Array[Partition]
protected def getPreferredLocations(split: Partition): Seq[String] = Nil
}
![Page 9: Alexander Pavlenko, Java Software Engineer, DataArt.](https://reader030.fdocuments.in/reader030/viewer/2022021503/5877f98b1a28ab91178b5513/html5/thumbnails/9.jpg)
Ryft RDD
*Typical query: http://ryftone0/search?query=(RAW_TEXT CONTAINS "test")&files=somefile.txt
![Page 10: Alexander Pavlenko, Java Software Engineer, DataArt.](https://reader030.fdocuments.in/reader030/viewer/2022021503/5877f98b1a28ab91178b5513/html5/thumbnails/10.jpg)
import com.ryft.spark.connector._
...
val sc = new SparkContext(sparkConf)
val query = RecordQuery(recordField("Description") contains
IPv4Value(IP === IPv4("192.168.190.151")))
val ryftOptions = RyftQueryOptions("data/*", xml)
val ryftRDD = sc.ryftRDD(Seq(query), ryftOptions)
...
Ryft RDD Example
![Page 11: Alexander Pavlenko, Java Software Engineer, DataArt.](https://reader030.fdocuments.in/reader030/viewer/2022021503/5877f98b1a28ab91178b5513/html5/thumbnails/11.jpg)
Data Locality & Partitioning Mechanism
abstract class RDD[T](...) {
...
protected def getPreferredLocations(split: Partition): Seq[String] = Nil
...
}
![Page 12: Alexander Pavlenko, Java Software Engineer, DataArt.](https://reader030.fdocuments.in/reader030/viewer/2022021503/5877f98b1a28ab91178b5513/html5/thumbnails/12.jpg)
Ryft DataFrame Support
Mapping of structured data at Ryft (JSON\XML) to DataFrame
RyftRelation extends BaseRelation with PruntedFilteredScan
val schema = StructType(Seq(
StructField("Arrest", BooleanType),StructField("Date", TimestampType),
StructField("Description", StringType), StructField("ID", StringType)
))
sqlContext.read.ryft(schema,xml,"*.crimestat","temp_table",
Map("date_format" -> "MM/dd/yyyy hh:mm:ss aa"))
sqlContext.sql("""select Date, ID, Description, Arrest from temp_table
where Date = '2015-04-15 23:59:00' ORDER BY Date""")
![Page 13: Alexander Pavlenko, Java Software Engineer, DataArt.](https://reader030.fdocuments.in/reader030/viewer/2022021503/5877f98b1a28ab91178b5513/html5/thumbnails/13.jpg)
Ryft Twitter Demo
![Page 14: Alexander Pavlenko, Java Software Engineer, DataArt.](https://reader030.fdocuments.in/reader030/viewer/2022021503/5877f98b1a28ab91178b5513/html5/thumbnails/14.jpg)
Q & A ?