Building a Graph-based Analytics Platform

download Building a Graph-based Analytics Platform

of 66

Embed Size (px)

description

Meetup is a valuable source of data for understanding trends around products or brands. Meetup does not support an analytics package to track group statistics overtime unless you are an administrator of a group. There are no third-party tools or websites that analyze Meetup trends to understand how communities grow. In this talk I will present a graph-based analytics platform that uses the Meetup.com API to collect and analyze membership statistics over time. This talk will cover: How to poll and import periodic data from the Meetup.com API into Neo4j using Node.js. How to track meetup group growth over time using a Neo4j graph database using Node.js. How to apply tags to meetup groups and report combined growth of all groups over time. How to build an interactive documented analytics API to support applications using Node.js and Neo4j. How to build a business dashboard to visualize time-based statistics and reports using a Node.js based REST API that queries Neo4j.

Transcript of Building a Graph-based Analytics Platform

  • (graphs)-[:are]->(everywhere) Building
  • a
  • graph-based
  • analytics
  • platform All Rights Reserved 2014 | Neo Technology, Inc. @kennybastani Neo4j
  • Developer
  • Evangelist
  • Using Meetup as an example use case Meetup.com is a valuable source of data for understanding trends around products or brands. Understanding demand is key for delivering compelling content at meetups. It sounded like a great use case for Neo4j.
  • The Problem Track meetup group growth over time. Apply tags to meetup groups and report combined growth of all groups over time.
  • Questions
  • Question #1 Given a start date and an end date, what is the time series that plots the membership growth of a given meetup group?
  • Question #2 Given a start date, an end date, and a combination of tags, what is the time series that plots the combined membership growth of all meetup groups with those tags?
  • Question #3 How do you generate the JSON data of a time series for a basic JS line chart plugin?
  • The Goal
  • The GraphGist Project The GraphGist project is a way to quickly build a graph-based proof of concept on Neo4j. I started with a GraphGist. Neo4j for Graph Analytics: Meetup.com Example
  • Graph Data Model
  • How are groups connected?
  • How are locations connected?
  • How are tags/topics connected?
  • How are stats connected?
  • How are days connected?
  • How are weeks connected?
  • How are months connected?
  • How are years connected?
  • Tackling Time in Neo4j How do you implement a time series in Neo4j? For any node that represents a unit of time, use a timestamp. Traversals can be costly for selecting time series. Expose a REST API that takes a normal date format and then convert it to an integer that allows you to select a range of dates in your Neo4j Cypher query. For any node that represents a unit of time, use a timestamp. Traversals can be costly for selecting time series. Expose a REST API that takes a normal date format and then convert it to a Int32 that allows you to select a range of dates.
  • Scale it up! It started with a GraphGist and then I said Why not? lets build something cool using Neo4j.
  • Challenges I decided to take my GraphGist and make a full platform. There were some challenges.
  • Challenge #1 How do I get historical Meetup group statistics for all groups?
  • Challenge #2 How do I handle the data import on a daily basis?
  • Challenge #3 What kind of reports do I want to create? What do I want to know about Meetup groups?
  • Challenge #4 How do I safely expose Neo4j to a client-side charting control?
  • Ask Questions I decided to start asking some questions about my data model.
  • What do I want to know? Assuming I had as much historical Meetup data as I pleased, what kind of questions would I want to ask about that data? How would I want to present it?
  • Whats the combined growth percent of Meetup groups having a certain topic? This chart plots a line chart of the time series for a meetup group topic on Meetup.com. Each group on Meetup.com has a set of topics associated with it. This chart is meant to show the percent growth month over month.
  • Whats the cumulative growth of Meetup groups with a specic topic? This chart plots a bar chart of the cumulative growth of a meetup group topic on Meetup.com. Using the time series data of monthly growth from the Meetup Tag Growth % chart, the growth percents over the period are aggregated into a sum for each topic. This chart shows total growth percentage over the period.
  • Whats the relative growth of Meetup groups with a topic for a date range? This chart plots an Donut Chart of the relative cumulative growth of a meetup group topic on Meetup.com. Using the data from Cumulative Meetup Growth, the percentage growth of each topic over the period is compared relative to one another as a ratio of 100.
  • How many groups does a topic have relative to others? This chart plots an Donut Chart of the number of groups in the region during the period for each topic. Each group is compared relative to one another as a ratio of 100.
  • Whats the growth percent of all groups for a topic in a location for a date range? This report is a simple table that shows the growth percent of all groups for a topic broke down by location. What do these high percentages tell us about Meetup? Within the last year there has been massive growth for meetup groups that are focused on NoSQL database technology. If I imported a different topic, not related to technology, what would the data show?
  • How do I give users a clean set of controls to lter and search?
  • Scaling it up Designing a graph-based analytics platform using Node.js and Neo4j
  • Architecture Front-end web-based dashboard in Node.js and bootstrap REST API via Neo4j Swagger in Node.js Data import services in Node.js Data storage in Neo4j graph database
  • Applications Analytics REST API (Node.js) Dashboard" (Node.js) Analytics Data Import Scheduler" (Node.js) Web Web Console
  • Neo4j (JVM) REST API (Node.js) Dashboard (Node.js) Import Scheduler (Node.js) Polls Meetup API Graph Data Storage Analytical Queries Presentation, Filtering FilterQuery Import Web App Web App Retrieves Report Data Visualizes Report Data
  • Analytics Dashboard
  • Analytics REST API
  • Data Import Scheduler
  • REST API The REST API is a fork of Neo4j Swagger. Swagger is a specication and complete framework implementation for describing, producing, consuming, and visualizing RESTful web services.
  • Demo http://meetup-analytics-api.herokuapp.com/
  • Swagger The REST API module of this project is based on a fork of Swagger.
  • The Neo4j Swagger Project The Swagger project was modied to use Neo4j as its data source. The REST API module of this project is extended from the Neo4j swagger project.
  • REST API Methods Get Weekly Growth Get Monthly Growth Get Monthly Growth By Tag Get Monthly Growth By Location Get Cities Get Countries Get Group Count By Tag
  • Get Weekly Growth Gets the weekly growth percent of meetup groups as a time series. Returns a set of data points containing the week of the year, the meetup group name, and membership count.
  • Get Monthly Growth Gets the monthly growth percent of meetup groups as a time series. Returns a set of data points containing the month of the year, the meetup group name, and membership count.
  • Get Monthly Growth By Tag Gets the monthly growth percent of meetup group tags as a time series. Returns a set of data points containing the month o