Stanford Computer Forum - Secure IoT Workshop, April...

22
Secure IoT Workshop, April 2016

Transcript of Stanford Computer Forum - Secure IoT Workshop, April...

Page 1: Stanford Computer Forum - Secure IoT Workshop, April 2016forum.stanford.edu/events/2016/slides/iot/Luke.pdf · DeepDive Framework for building machine learning systems DeepDive applications

Secure IoT Workshop, April 2016

Page 2: Stanford Computer Forum - Secure IoT Workshop, April 2016forum.stanford.edu/events/2016/slides/iot/Luke.pdf · DeepDive Framework for building machine learning systems DeepDive applications

2

Page 3: Stanford Computer Forum - Secure IoT Workshop, April 2016forum.stanford.edu/events/2016/slides/iot/Luke.pdf · DeepDive Framework for building machine learning systems DeepDive applications

Overview

3

● Building a Component Library○ Motivation

○ Approach

○ Key Insights and Results

○ Future Work

○ Summary

Page 4: Stanford Computer Forum - Secure IoT Workshop, April 2016forum.stanford.edu/events/2016/slides/iot/Luke.pdf · DeepDive Framework for building machine learning systems DeepDive applications

Problem 1: There are a lot of componentsComponent Family Number of Entries on

Parts.io

Connectors 35,835,399

Power Circuits 4,308,382

Diodes 2,200,889

Sensors/Transducers 1,429,819

Memories 1,319,651

Microcontrollers 818,329

Transistors 659,628

Drivers/Interfaces 121,955

Amplifiers 98,506

Transformers 86,021

4

Page 5: Stanford Computer Forum - Secure IoT Workshop, April 2016forum.stanford.edu/events/2016/slides/iot/Luke.pdf · DeepDive Framework for building machine learning systems DeepDive applications

Problem 2: Components are complicated

5

DigiKey Information for Atmel SAMD21

Characteristic Value

Manufacturer Part Number ATSAMD21E17A-MUT

Description IC MCU 32BIT 128KB FLASH 32QFN

Core Processor / Core Size ARM Cortex M0+ / 32-Bit

Speed 48MHz

Connectivity I²C, LIN, SPI, UART/USART

Peripherals Brown-out Detect/Reset, DMA, I²S, POR, PWM, WDT

Number of I/O 26

Program Memory Size / Type 128KB (128K x 8) / FLASH

RAM Size 16K x 8

Voltage Supply 1.62 V ~ 3.6 V

Data Converters A/D 10x12b, D/A 1x10b

Operating Temperature -40°C ~ 85°C

The datasheet for this part contains

1108 pages

Page 6: Stanford Computer Forum - Secure IoT Workshop, April 2016forum.stanford.edu/events/2016/slides/iot/Luke.pdf · DeepDive Framework for building machine learning systems DeepDive applications

Problem 2: Components are complicated

● Websites lack the details necessary to actually build boards

● Requires designer to consult PDFs

6

Page 7: Stanford Computer Forum - Secure IoT Workshop, April 2016forum.stanford.edu/events/2016/slides/iot/Luke.pdf · DeepDive Framework for building machine learning systems DeepDive applications

Problem 2: Components are complicated

7

Page 8: Stanford Computer Forum - Secure IoT Workshop, April 2016forum.stanford.edu/events/2016/slides/iot/Luke.pdf · DeepDive Framework for building machine learning systems DeepDive applications

Problem 2: Components are complicated

8

Page 9: Stanford Computer Forum - Secure IoT Workshop, April 2016forum.stanford.edu/events/2016/slides/iot/Luke.pdf · DeepDive Framework for building machine learning systems DeepDive applications

Step 1: Transistors and Tables● Datasheets are usually small (1-4 pgs)

● Variety of formats

● Relatively small schema

● Data primarily in tables

● Example: match part numbers to minimum storage temperatures

9

Page 10: Stanford Computer Forum - Secure IoT Workshop, April 2016forum.stanford.edu/events/2016/slides/iot/Luke.pdf · DeepDive Framework for building machine learning systems DeepDive applications

Approach

Tabl

e Ex

trac

tor

10

doc | part_num | storage_temp_min------+----------+-----------------X.pdf | BC546 | -55X.pdf | BC547 | -55 X.pdf | BC548 | -55

Page 11: Stanford Computer Forum - Secure IoT Workshop, April 2016forum.stanford.edu/events/2016/slides/iot/Luke.pdf · DeepDive Framework for building machine learning systems DeepDive applications

Extracting Tables from PDFs

● Rule-based tools exist to extract data from PDFs

● Results are noisy and depend on the format of the PDF

● Potential signals are lost in the output

● Hard problem with active research: ICDAR, JIS, CIKM

● We made our own simple extractor for testing

11

Page 12: Stanford Computer Forum - Secure IoT Workshop, April 2016forum.stanford.edu/events/2016/slides/iot/Luke.pdf · DeepDive Framework for building machine learning systems DeepDive applications

data represented

12

Page 13: Stanford Computer Forum - Secure IoT Workshop, April 2016forum.stanford.edu/events/2016/slides/iot/Luke.pdf · DeepDive Framework for building machine learning systems DeepDive applications

Challenges in Table Extraction

13

Page 14: Stanford Computer Forum - Secure IoT Workshop, April 2016forum.stanford.edu/events/2016/slides/iot/Luke.pdf · DeepDive Framework for building machine learning systems DeepDive applications

DeepDive

● Framework for building machine learning systems● DeepDive applications have achieved better-than-human

accuracy● Operates on two key components: candidates and features

14

Page 15: Stanford Computer Forum - Secure IoT Workshop, April 2016forum.stanford.edu/events/2016/slides/iot/Luke.pdf · DeepDive Framework for building machine learning systems DeepDive applications

Candidate Extraction

● Candidate is part number and minimum storage temperature pair

● Design decisions significantly impact performance

● Generality is better○ match part numbers and numbers rather

than part numbers and storage temps○ More features to train on

15

Page 16: Stanford Computer Forum - Secure IoT Workshop, April 2016forum.stanford.edu/events/2016/slides/iot/Luke.pdf · DeepDive Framework for building machine learning systems DeepDive applications

Feature Extraction

● Not all features are created equal○ Alignment and sibling words○ Number characteristics○ Nearest part number○ Position

● Ideally minimize computations and data needed to label candidates with features○ Candidate features as a function of

individual features

16

Page 17: Stanford Computer Forum - Secure IoT Workshop, April 2016forum.stanford.edu/events/2016/slides/iot/Luke.pdf · DeepDive Framework for building machine learning systems DeepDive applications

Results

Predicted Incorrectly

Predicted Correctly

Positive Cases

35 349

90.9% of positives predicted correctly

17

● When run on a set of 100 PDFs with 384 unique pairs

Page 18: Stanford Computer Forum - Secure IoT Workshop, April 2016forum.stanford.edu/events/2016/slides/iot/Luke.pdf · DeepDive Framework for building machine learning systems DeepDive applications

Results: Error Analysis

● Of those 35 entries we didn’t get…

● Error analysis guides improvement of features

● Build robust systems

18

Page 19: Stanford Computer Forum - Secure IoT Workshop, April 2016forum.stanford.edu/events/2016/slides/iot/Luke.pdf · DeepDive Framework for building machine learning systems DeepDive applications

Future Work

● Parse data from non-table elements

19

Page 20: Stanford Computer Forum - Secure IoT Workshop, April 2016forum.stanford.edu/events/2016/slides/iot/Luke.pdf · DeepDive Framework for building machine learning systems DeepDive applications

Future Work

● Analyze more complex datasheets○ Microcontrollers that contain

subcomponents■ USB, Bluetooth, etc.

○ Datasheets that explain part numbers

20

Page 21: Stanford Computer Forum - Secure IoT Workshop, April 2016forum.stanford.edu/events/2016/slides/iot/Luke.pdf · DeepDive Framework for building machine learning systems DeepDive applications

Future Work

● Process data from sources outside of PDFs○ Information from distributors

■ Pricing, Popularity, Availability

○ Drivers

○ Reference schematics

○ Example application code

○ Development tools

21

Page 22: Stanford Computer Forum - Secure IoT Workshop, April 2016forum.stanford.edu/events/2016/slides/iot/Luke.pdf · DeepDive Framework for building machine learning systems DeepDive applications

Summary

● Building a Component Library○ There are millions of components of varying complexity○ Machine learning can be used to extract data from PDFs○ Success will enable exciting applications

■ Embedded Device Generation■ Detailed search engine for components■ Data analytics, and more

22

Questions?