Building the Neo4j Sandbox: AWS, ECS, Docker, Python, Neo4j, ++
GraphTour - Workday: Tracking activity with Neo4j (English Version)
-
Upload
neo4j-the-fastest-and-most-scalable-native-graph-database -
Category
Software
-
view
99 -
download
0
Transcript of GraphTour - Workday: Tracking activity with Neo4j (English Version)
Tracking Activity with Neo4j
• Build Engineer (Build Engineering Team)
• Located in Paris, France
• Responsibilities
‒ Development of reusable Gradle plugins
‒ Administration of Artifactory
‒ Development of custom tools
‒ Support to engineering teams (mainly build-related)
‒ Sentinel (Server to track activity)
Who Am I ?
Workday Confidential
The Build Engineering Mission
• Define policies for engineering teams
(dependency locking, artifact promotion,
artifact metadata)
• Provide reusable tooling (Gradle plugins &
other custom tools)
• Administer shared services (Artifactory)
• Provide assistance to engineering teams
Build Engineering – Our mission
Workday Confidential
• To ensure policies are followed
• Engineers enjoy a lot of freedom at Workday !
‒ Netflix: The Paved Road
• Is our tooling relevant ?
• Gain insight into how development teams are working
We need answers to those questions !
Why We Need Monitoring
Workday Confidential
• Artifacts (Jars, Rpms, “Deliveries”, etc)
• CI Builds (in Bamboo, Team City and Jenkins)
• SCM changes (in BitBucket, GitHub, Gerritt, etc)
• Dependencies (between Artifacts, Builds, etc)
• JIRA issues (tracking of code)
• Promotions (of artifacts)
• Metadata in general
We’re interested in …
Workday Confidential
• No unified system of records with all this information !
• The data is scattered across different systems (AF, CI, JIRA…)
• … is secured with different credentials (AD, LDAP)
• … is stored under different formats (JSON, XML, CSV, etc)
• … is not always easily accessible
• Accessing one data source is (usually) easy
• Accessing two data sources is already a bit trickier
• No unified query language for joining the aggregated data
Problem: The Data is Everywhere
Workday Confidential
Requirements
• Simple access to the information
• Unified and intuitive data model
• Powerful query language
• Data as accurate as possible
Frequent updates to the data
Updates must be fast (performance)
• Ability to easily refactor the data model
• Ability to expose this information to engineering teams (automation)
Requirements
Workday Confidential
• We don’t want to rely on users to provide the information (unless we
have no other choice) !
• The information we need usually already exists or can be derived,
let’s use it !
But first of all !
Workday Confidential
Sentinel – Architecture Overview
Architecture Overview
Workday Confidential
REST API
Web UI
Data
Miner
…
JIRA
Artifactory
Bamboo
BitBucket
Data SourcesA foundation to solve current and future problems
Neo4j
Aggregation
Sanitization
Normalization
• Command line tool (written in Groovy)
• Executable fat jar
• Runs from Bamboo every 15 mins
• Scans the data sources containing the information we need
• Preemptively extracts, sanitizes & normalizes the data
• Detects incremental changes (optimized for performance)
• Crash-proof
• A run executes 59 commands in sequence
• Scan time: 8 mins (min), 23 mins (average)
The Data Miner
Workday Confidential
• A (NoSQL) graph database
• Graph paradigm is good for our need
• Very flexible and easy to use
• Schema-less
• Excellent performance
• All the useful data in one place
• Cypher (Query Language) !
The Neo4j Database
Workday Confidential
• UI made of HTML dashboards & dynamic charts
• REST API
• Spring Boot, Thymeleaf, D3.js, Swagger
The Services We Expose
Workday Confidential
Neo4j in a Nutshell
• Nodes have properties (Comparable to a Map<String, ?>)
• … can have 0-N labels (Typing, Polymorphism)
Neo4j - Nodes
Workday Confidential
core
1.0.5 jar
Artifact ArtifactoryFile Workday id com.workday:core
group com.workday
artifact core
version 1.0.5
created 1458713182201
• Relationships represent an edge between 2 nodes
• … have a name
• … can be directed
• … can have properties
Neo4j - Relationships
Workday Confidential
Artifact
core
1.0.5 jar
Git Commit
core
5ce1f767
HAS_COMMIT
Neo4j query language
A node ()
A labeled node (:Person)
A relationship between 2 nodes ()--()
A directed labeled relationship ()-[:PARENT_OF]->()
MATCH (parent:Person)-[:PARENT_OF]->(child:Person)
RETURN parent.name, COLLECT(child.name)
Neo4j - Cypher
Workday Confidential
Extracting the Data
Everything Starts with Artifactory
Workday Confidential
• Official repository of Artifacts, Rpms, Docker images
• REST API to detect new artifacts in repositories
Step 1: Artifacts
Workday Confidential
• URI: com/workday/core/1.0.5/core-1.0.5-javadoc.jar
Group com.workday
Module core
Version 1.0.5
Type jar
Classifier javadoc
ID: “com.workday:core:1.0.5:javadoc@jar”
Artifact
core 1.0.5 jar
javadoc
Step 2: Module Versions
Workday Confidential
• The artifact relates to a “Module Version”
Group com.workday
Module core
Version 1.0.5
ID: “com.workday:core:1.0.5”
ModuleVersion
core 1.0.5
Step 3: Modules
Workday Confidential
• The module version relates to a “Module”
Group com.workday
Module core
ID: “com.workday:core”
Module
core
All Together With Relationships
Workday Confidential
Module
coreArtifact
jar
Artifact
javadoc
jar
Artifact
sources
jar
ARTIFACT_OF
Artifact
jar
Artifact
javadoc
jar
Artifact
sources
jar
ARTIFACT_OF
Version
1.0.5
Version
1.0.7
VERSION_OF VERSION_OF
Version
1.0.6
Step 4: Artifact Dependencies
Workday Confidential
• Maven / Ivy descriptors Dependencies
• Dependencies DEPENDS_ON relationships
services
2.0.0DEPENDS_ON
gson
2.2.2
core
1.0.5DEPENDS_ON
Step 5: Artifact Metadata
Workday Confidential
• Populated at build time (by a custom Gradle plugin)
• Captures information about
‒ Gradle, JDK, Build machine
‒ CI builds
‒ SCM changes
• Makes artifacts “self-documented”
Manifest Metadata – SCM Info
Workday Confidential
• WD-Git-Origin ssh://[email protected]/core/core.git
• WD-Git-Commit e28a60b96f452680c57cb76798def09fd171011f
Artifact
core
1.0.5 jar
Git Commit
core
e28a60…
HAS_COMMIT
Concrete Examples
List of all Workday Artifacts
Workday Confidential
Group Module Latest
Version
Age
(days)
SCM url SCM
change
Build
URL
Latest
JIRAs
com.workday core 1.0.5 120.2 core.git e28a60b9 URL CORE-120
com.workday foo-services 1.3.0 29.1 foo-services.git 146ae135 URL FOO-57
com.workday bar-services 2.2.8 54.8 bar-services.git b538c156 URL BAR-70
… … … … … … … …
Public dashboard accessible with latest information (automatically up-to-date)
→ Where’s the build of this jar file ?
→ Where are the sources for this jar file ?
Identifying Direct Dependents
Workday Confidential
MATCH (dependent:ModuleVersion)-[:DEPENDS_ON]->(dependency:ModuleVersion)
WHERE dependency.id = "com.workday:core:1.0.5”
RETURN dependent.id AS dependent
Service in the Sentinel REST API
Dependent
com.workday:foo-
services:1.3.0
com.workday:foo-
services:1.2.5
com.workday:bar-
services:2.2.8
com.workday:bar-
services:2.2.7
…
Build Orchestration
Workday Confidential
Producing build Consuming build
Bamboo Build
CORE
BUILT
Artifact
core 1.0.5
jar
ModuleVersion
core 1.0.5
AR
TIF
AC
T_
OF
ModuleVersion
foo-services 1.3.0
AR
TIF
AC
T_
OF
Bamboo Build
FOO-SERVICES
Artifact
foo-services 1.3.0
jar
BUILT
DE
PE
ND
S_
ON
DEPENDS_ON
Automated Release Notes
Workday Confidential
Version 1 Version 2
Bamboo Build
CORE #11
BUILT
Artifact
core v1
jar
Git Commit
core
5ce1f767
HA
S_
RE
VIS
ION
Git Commit
core
ee2a0e22
HA
S_
RE
VIS
ION
Bamboo Build
CORE #12
Artifact
core v2
Jar
BUILT
PA
RE
NT
_R
EV
ISIO
N
JIRA Issue
CORE-120
LINKS_TO
Identify SCM Changes per JIRA
Workday Confidential
Find all mentions of a JIRA in commit messages
Input: JIRA issue
Output: Set of SCM changes
JIRA Issue
CORE-120
Git Commit
core
5ce1f767
Git Commit
core
5954ff88
Git Commit
core
ee2a0e22
Rule: “No dynamic dependencies in Maven / Ivy files”
Rationale: Builds must be reproducible
Dynamic versions: 1.+, LATEST, [1.0, 2.0[
Detection of Rule Violations
Workday Confidential
HTML dashboard listing the latest violations
ModuleVersion
baz 4.2.10
ModuleVersion
pmd-checks
1.+
DEPENDS_ON
Conclusion
Workday Confidential
• Service rolled out internally
• Neo4j is the perfect tool for capturing the data we’re interested in
‒ Very easy to refactor / enrich the data
• Cypher gives us insight from the aggregated data
• Solid foundation for future services
‒ Difficult part: Capturing the data
‒ Easy part: Leveraging the data by creating new queries
• Decisions based on facts, not (educated) guesses
• Holistic reporting
Q & A
Thanks for attending
Workday Confidential
TM