Using ELK Explore Defect Data
-
Upload
atf117 -
Category
Data & Analytics
-
view
442 -
download
0
Transcript of Using ELK Explore Defect Data
Use ELK Explore Defect Data
Xu Yabin
Singapore
Content
Customer requirements and defect KPI definition
ELK solution
ELK compared to traditional analytics
method
Customer Requirement
• Online web applications which need to be deployed frequently
• Serious defects and quality issues
• Not enough test before applications deployed
• Defects are always out of control after applications deployed
• Serious defects are always found after the application deployed
• Serious defects are not fixed on time
• Implement Continuous integration and defect management system
• What the result is and how to do continuous assessment for DevOps
activities
Defect KPI Definition
• Based on the customer’s requirements, the defect KPI is
defined as below
• Defect number and distribution
• Defects number before and after applications deployed
• Serious defects number before and after applications
deployed
• Serious defects fixed time
Data analytics tools requirement
• What data analytics tools do we need
• Easily import defect data from current defect system
• Easily configure and calculate to get the KPI data
• Explore defect data without any data model preparation
• Easily dig into the detailed information
• Easy to maintain
• We choose ELK (Elasticsearch, Logstash, Kibana)
Content
Customer requirement and defect KPI definition
ELK solution
ELK compared to traditional analytics
method
ELK Solution
Defect Management System
Distributed data storage and search engine
Original defect data
Logstash Elasticsearch Kibana
Data collector Data analytics and result
• Most of the works are done through configuration, not coding
Original defect data
• Original defect data is from customer’s defect management system,
XML format
ELK Data collector: Logstash
• Collect defect data using Logstash
• Compared to traditional data collector (much code work is needed), Logstash
need no code, only several lines of configuration
• Defect data is put into Elasticsearch through Logstash pipeline
ELK User interface configuration: Kibana
• When data is imported into Elasticsearch, UI configuration
can be done using Kibana
• UI configuration is focused on what will be displayed
• Configuration is in a very natural way
• No business data model is needed before doing the configuration
ELK : User interface
• Easily add query conditions and filters to dig into the data
ELK: Filter and dig into the data:defect distribution by time
• The defect data
view shows all
defect data
Most defects are created
in the year 2015, use the mouse to drag
the area
The defect data is
filtered by the
The defect data is filtered by
the time you selected
ELK: Filter and dig into the data:defect distribution by product
• The defect data
view shows all
defect data
Green part is one product
Double click the green product
The defect data is
filtered by the green product
The defect data view can be
changed to green product
defects
ELK: Multidimensional analysis: defect distribution by product
• Defects
• Defects of different products,different color stands for different products
ELK: Defect KPI displayed• Severity
• Defect before or after release
• Defect close time
Content
Customer requirement and defect KPI definition
ELK solution
ELK compared to traditional analytics
method
ELK: Advantages
• Analyze data without coding
• Fast deliver and low cost
• High flexibility to analyze data
• Easy deploy and maintain
• Learn business data before the data model is created
• Explore and dig the data step by step based on your understanding of
the business
• Big data method
• Performance
• High Availability
• Extendable
• Collect and import data easily
ELK: Why analyze data without coding
• Data analyzing and display
• Traditional method
• The bottle neck is related database
• Aggregated analysis can’t be done by database itself
• We need code using SQL statement like group by and count
• Even simple code make the analytics difficult, because the data,
data process and UI are coupled with the code
• ELK solution
• Powerful aggregated analysis and search capability
• UI is not coupled with data
• Query conditions and filter can be easily added to current
query
• Simple and powerful aggregated analysis,as SQL
group by
• Business concept can be learned from data
aggregation
• Below is Elasticsearch aggregating code
GET _search
{
"aggs" : {
“product": {
"terms": {"field": "{parsed_xml.product}"}
}
}
}
• The search result can be used for another query
"query_string": { "query":
"parsed_xml.product:\“drivers\" AND (*)" }
ELK: Query from configuration not coding
• Traditional data query issues:
• Too much data returned from select statement
• The main reason is that people don’t know how much data
will be returned before doing select
• The data is not filtered
• Too much data in one single table
• If one table is divided, the query code need to be modified
to merge the query result
• Too much influence to existed program
• Not easy to be extended when data increases
Traditional data query issues
• Big data method and concept
• When the amount of data can not be processed or handled by a
single point of resources (machines, CPU, etc.), The data and
the processing power and can be horizontal split, and does not
substantially affect the existing architecture
• ELK solution:
• Too much data returned from select statement
• Count before query
• Filtered before query using aggregating result
• Too much data in a single table
• One table can be divided, no need to change query
statement
• Time sequence is supported, easy to divide the time serous
data
• Easy to be extended through distributed data storage
How ELK deal with the data query issues
• From:
https://www.elastic.co/guide/en/elasticsearch/guide/current/replica-
shards.html
• Elasticsearch allows you to start small and scale horizontally as you
grow. Simply add more nodes, and let the cluster automatically take
advantage of the extra hardware.
• Elasticsearch clusters are resilient — they will detect new or failed
nodes, and reorganize and rebalance data automatically, to ensure
that your data is safe and accessible.
ELK data storage: Elasticsearch distributed data storage
• Traditional data collector issues:
• The database is strictly defined by data types (schema)
• Same data may has different data types in different system
• The data schema relationship (data mapping) between
different system should be defined correctly before data
import
• Or the data import will be failed
Traditional data collector issues
• ELK solution:
• Schema less data import
• No consider data type before data import
• If default data type is not right, it can be changed
How ELK deal with the data collector issues
• With the existing plug-ins, much less programming or no programming
• Filtering, processing and increased data can be easily added to an existing
collection pipe line
• Input and output contents are flexible and extendable
ELK data import: Logstash pipe line
Input:defect data file
Filter1:normalize XML format
Filter2: Get and parser defect data
Filter3: Change time format of the input data
Output:Elasticsearch
Input:defect data file
Filter4: Add a defect fixed time field calculated by defect close time minus defect open time
Output:Elasticsearch
Filter1 Filter2 Filter3
Want to add Filter4 to get defect fixed time
• From: https://www.elastic.co/guide/en/logstash/1.5/deploying-and-
scaling.html
ELK data import: Logstash architecture