00002170 - Internet of Things (Infobright)

12
Exploiting the Internet of Things with investigative analytics A White Paper by Bloor Research Author : Philip Howard Publish date : May 2013 White Paper

description

Internet of Things

Transcript of 00002170 - Internet of Things (Infobright)

  • Exploiting the Internet of Things with investigative analytics

    A White Paper by Bloor ResearchAuthor : Philip HowardPublish date : May 2013

    White Pap

    er

  • The Internet of Things has the potential to change the world, just as the Internet did. Maybe even more so Kevin Ashton

  • 1 2013 Bloor ResearchA Bloor White Paper

    Exploiting the Internet of Things with investigative analytics

    Introduction

    There is a wealth of information hidden in the Internet of Things that can help organisations to understand what happened or might happen and why it happened or may happen, and help to point towards what to do about it. However, before we consider how to analyse this infor-mation and why it is important to your busi-ness we need to understand what we mean by the Internet of Things and by investigative analysis.

    The Internet of Things

    The Internet of Things was first described by Kevin Ashton in 1999. He wrote computersand, therefore, the Internetare almost wholly dependent on human beings for infor-mation. Nearly all of the data available on the Internet was first captured and created by human beingsby typing, pressing a record button, taking a digital picture or scanning a bar code. Conventional diagrams of the Internet ... leave out the most numerous and important routers of allpeople. The problem is, people have limited time, attention and accuracyall of which means they are not very good at capturing data about things in the real world. And thats a big deal. Were physical, and so is our environment ... You cant eat bits, burn them to stay warm or put them in your gas tank. Ideas and information are important, but things matter much more. If we had computers that knew everything there was to know about thingsusing data they gathered without any help from uswe would be able to track and count everything, and greatly reduce waste, loss and cost. We would know when things needed replacing, repairing or recalling, and whether they were fresh or past their best. The Internet of Things has the potential to change the world, just as the Internet did. Maybe even more so.

    Today there are multiple definitions of the Internet of Things but this is as good a place to start as any: the point is that a) more and more things (vehicles, smart meters, cell phones, planes, oil rigs, shop floor devices, clickstream data, anything with an active RFID tag, and so on) are being or have been instrumented and b) we now have the ability to analyse the infor-mation coming from this instrumentation in a cost-effective manner, so that the Internet of Things is becoming a reality.

    What does the Internet of Things mean to your organisation? Clearly it depends on your busi-ness but in principle it allows you to perform what we might call investigative analytics: exploring the what, why and how of all this instrumented data.

    Investigative analysis

    The term investigative analysis was first coined by Curt Monash in 2011 as a function to support research, investigation and analysis in support of future decisions. He defines it as seeking (previously unknown) patterns in data. More specifically he describes it as a conflation of several disciplines, including statistics, data mining, machine learning and/or predictive analytics; together with the more research-oriented aspects of business intel-ligence tools, including ad hoc query, drill-down, most things done by BI-using business analysts, and most things within BI called data exploration; plus analogous technolo-gies as applied to non-tabular data types such as text or graph.

    In other words, you are interested in discov-ering a pattern of past activity that point to some likely outcome in the future. And you want to be able to do that across any type of data regardless of whether it is transactional or not. To put this another way: something happenedis this part of a pattern that indi-cates that it might happen again? If so, what is that pattern and how can we can leverage it for business purposes in the future?

    In this paper we will explore some of the use cases around investigative analysis and how that can be applied to the Internet of Things, and then we go on to consider the sort of tech-nology you need to enable this capability. We will conclude with a discussion of the solution provided by Infobright, a data warehousing vendor that is addressing the market for investigative analytics.

  • 2 2013 Bloor Research A Bloor White Paper

    Exploiting the Internet of Things with investigative analytics

    Use cases

    There are a great many potential environ-ments where investigative analytics might be deployed. The following represent a sampling only and as the Internet of Things becomes more prevalent it is likely that new use cases will emerge. However, broadly speaking we can say that investigative analysis will allow you to:

    1. Discover why something went wrong and determine what to do to prevent it going wrong in the futurethis would apply to things like dropped calls for mobile networks, preven-tative maintenance across various industry sectors, smart meters (ditto), routing of both transportation and goods and so on.

    2. Discover why something went right so that you can build processes to support an increased likelihood of things going right in the futurefor example, monitoring and analysing web traffic or mobile usage to encourage upsell or cross-sell opportunities. Sales of location-based services in mobile environments are a particular case in point.

    3. Plan capacity to support requirements and service level agreements in the most cost-effective fashion. This applies particularly in smart metering environments, mobile services and transportation environments, amongst others, where forecasting and meeting future demand is essential.

    It is worth noting, before we move on to discuss individual use cases that a number of these scenarios require real-time query processing as well as more in-depth, batch-oriented analytics.

    Smart meters

    Smart metering is of increasing interest around the world. While there are significant implementations already in the United States the rest of the world is some way behind in this respect. However that will change: for example, in accordance with European Union market guidelines, 80% of all households in Germany should be equipped with smart meters by 2020. Smart meters are installed in homes and businesses and feed a steady stream of data to the relevant application, where the data is analysed and the results used to efficiently allocate energy resources in real time, so that less energy is wasted. In addition, information collected from smart meters can be combined with weather fore-cast and other data (such as major sporting

    events or TV programmes) to predict future energy requirements so that appropriate resources will be available. If we return to the German example, some 32 million house-holds will need to be metered by 2020, which represents an enormous amount of event data that must be captured, analysed, and acted upon.

    Banking and financial services

    ATMs (automated teller machines) are not dissimilar to smart meters: they provide you with a service (money, statements and so forth) and they update your account. They have not, historically, been used to collect data in order to forecast future demand but it is clear that there is a shift away from cash and towards automated payments of various kinds. Banks are therefore looking to rationalise their ATM networks. On the other hand they do now wish to alienate existing customers that still need access to cash. Understanding who uses cash machines and how often will be fundamental to any decisions and, given that you can withdraw cash from your bank at a rival banks ATM it will make sense for banks to collaborate on where ATMs should be rationalised.

    The flip side of the move away from cash is the increase in mobile payments. This raises two interesting areas with respect to investigative analysis. One is that the less people use cash and the more they use electronic payments the easier it is to profile those individuals and to understand their preferences, which, in turn, can enable better upsell and cross-sell oppor-tunities. Conversely, there are also security implications: in particular, the better you understand customer spending patterns the better you are able to detect the likelihood that a card or mobile phone has been stolen and is being used for fraudulent purposes.

    Network analysis in telecommunications

    For both performance and planning purposes telcos need to monitor and analyse traffic. Key elements to determine are hot spots within the networkareas with particularly high usageand failures within the network. A proper understanding of the former, and how this is developing over time, will be critical to future investments in new infrastructure in order to meet growing demand. Conversely, any failure within the network is an imme-diate problem that needs to be resolved as speedily as possible. Failures may lead to

  • 3 2013 Bloor ResearchA Bloor White Paper

    Exploiting the Internet of Things with investigative analytics

    Use cases

    connections being dropped (which are well known potential indicators of customer churn) or reduced service. The analysis of usage trends, combined with location-based and demographic data, will be important for plan-ning future infrastructure investments.

    In telecommunications there are also a number of other areas where investigative analysis may be used, for analysis associated with mobile payments, location-based services and so forth, as previously discussed.

    Preventative maintenance

    The average oil platform has 40,000 sensors. A flight across the Atlantic generates over 9TB of data about the status of the plane you are in. Trains and railway tracks abound with sensors. However, this is not limited to transportation: equipment of all sorts, whether on construc-tion sites or the shop floor, has built-in moni-tors and sensors designed to alert operators, pilots or drivers to any problems that may occur. However, historically this information has been discarded rather than analysed, principally because there were not the tools available to analyse this data in a cost-effective manner. With modern technology this is now changing and this wealth of information is being used to identify patterns of failure (if this component fails then that one is likely to do so within a certain period) and to predict the failure of particular elements so that preventa-tive maintenance can avert potential problems.

    It should be noted that preventative mainte-nance doesnt just apply to equipment and machinery of various sorts but also to people. For example, there are a number of profes-sional sports bodies (for example, in football) that monitor their players activities on the field and in training so that they can be rested at appropriate times in order to avoid injury. As a different example, there is a company in Australia that provides glasses to drivers of trucks and heavy (mining) equipment which monitors how often the drivers blink, blinking being a sign of tiredness. Not only does this prevent accidents in the short term (the driver will be alerted if tiredness is indicated) but subsequent analysis of the data can helps to optimise shift patterns and rosters. Telemetry used by motor insurance providers to monitor driving safety is yet another example (though the emphasis here is more on calculating premiums) and no doubt there are also appli-cations within healthcare.

    Logistics

    Anywhere where GPS signals are part of a business process there is likely to be an appli-cation for investigative analytics. For example, one of the major oil companies that has oil rigs in the Arctic uses GPS tracking informa-tion combined with weather data to predict the movement of ice floes that can impact on drilling operations. More prosaically, road transportation, field service management and similar sectors are heavily dependent on traffic patterns and the location of relevant vehicles to optimise routing, while container tracking has similar requirements. In these cases there are both real-time issues (recognising that re-routing would be appropriate and doing so) and long-term analytic requirements that will enable better routing in future.

    Comparable requirements can also apply in retail and manufacturing environments where, instead of using GPS signals, goods or parts are identified by RFID tags. One leading aircraft manufacturer, for example, makes extensive use of this technology for part tracking and optimisation.

    Security information and log data

    Unlike the other use cases discussed, the management and analysis of security informa-tion (who might be attacking your companys infrastructure and how) is by no means a new market. The standard approaches to this market include the use of SIEM (security information and event management) and log management, where the former is a superset of the latter that includes (near) real-time identification of attack vectors as well as the storage and forensic analysis capabilities provided against log data. The storage of log data needs to be extremely efficient. Histori-cally, companies used to store only a few months worth of data for online analysis. However, the increase in low and slow attacks, or advanced persistent threats (APTs), which can spread over not just months but years means that very efficient storage mechanisms are required that at the same time support the sort of in-depth analytics that are required to identify patterns of activity. Such patterns may be fraudulent or, more often, patterns that will enable the identification of threats.

  • 4 2013 Bloor Research A Bloor White Paper

    Exploiting the Internet of Things with investigative analytics

    Use cases

    In summary

    We do not need to belabour the point: almost anywhere that machines or devices generate information there is scope for investigative analytics because machines always go wrong or are in the wrong place or are doing something interesting right now that you would like to know about. And you would like to know about it not just so that you can take appro-priate action at this particular moment but also so that you can analyse and predict when this might happen again so that you can prevent it and, thereby, provide a better service to your customers and/or users.

  • 5 2013 Bloor ResearchA Bloor White Paper

    Exploiting the Internet of Things with investigative analytics

    What is required

    Because you are potentially going to be storing and analysing a lot of data you will need a technology that enables you to exploit this data for business insightto reduce costs, identify new revenue streams, and improve competi-tive positioning. However, because the sorts of analytics we are discussing include a real-time component, then a simple batch-based analytic environment (such as Hadoop) will not be sufficient for the fast, interactive queries needed. There are therefore a number of requirements for such an engine, as follows:

    1. It must be scalable enough to hold all the data you need for long-term analytics. This will obviously be dependent on the environ-ment. For example, to determine preventa-tive maintenance characteristics against the 40,000 sensors on an oil rig will certainly require months and quite possibly years worth of data, which will be on a different scale from network analysis in a telecom-munications company which has a limited number of masts and only needs to perform analyses across limited periods of time.

    2. It must be fast enough to ingest the data within a reasonable timeframe, depending on the latency required. That is, you need to able to load the data fast enough to provide for whatever real-time query processing or alerting that is required.

    3. We are potentially talking about very large quantities of data, notwithstanding the comments made in paragraph 1. In order to be able to store this in an economical fashion you need very efficient compression of the data so that storage requirements can be minimised.

    4. Next, while not necessarily imperative, it is likely that you will not want a system that requires no manual tuning or DBA admin-istration such as the creation of indexes. If you have to index the data as it is loaded this will significantly slow down the loading process and it will add to the size of the database, not to mention adding to admin-istrative costs.

    5. Finally, the actual time taken to process queries needs to be fast enough to meet service level requirements, especially bearing in mind that this may involve complex analytics. In addition, it is prob-able that you will wish to run ad hoc queries against the data as well as running standard report and analytic processes, so the data-base will need to be fast enough and flex-ible enough to support this in an efficient manner. Databases that use indexes or other constructs, such as projections, to achieve fast query performance will not usually provide good enough performance for unplanned (ad hoc) queries.

    6. The sorts of applications we are discussing are often mission-critical as they support the real-time operations of your organisa-tion. It is therefore necessary that any solu-tion is at least highly available (caters for unplanned downtime without stopping) and, preferably, that it is continuously available (caters for planned downtime as well as unplanned stoppages).

    Of course there are also more generic require-ments such as simple and quick implementa-tion, low costs (both direct and indirect), minimal administration and so on.

  • 6 2013 Bloor Research A Bloor White Paper

    Exploiting the Internet of Things with investigative analytics

    Infobright

    Infobright is a provider of analytic database technology that comes in three flavours: enterprise-class appliance configurations, software-only installations, and embedded OEM implementations. At its core, Infobright is a columnar database initially built on MySQL. Column-oriented databases are better suited for analytics than row-based databases since, unlike transaction processing environments, it is commonly the case that only a limited subset of columns are required from each record. By grouping the data together in this way, the database only needs to retrieve columns that are relevant to the query, greatly reducing the overall I/O. Being column-based also has the advantage of providing improved compression, which further reduces storage and improves performance.

    However, Infobright goes beyond the conven-tional use of columns to provide even better performance, better compression, and reduced administration through the use of its Knowledge Grid. This is based on the concept of Data Packs. The data within each column is stored in 64K item groupings called Data Packs. The use of Data Packs improves data compression as the optimal compres-sion algorithm is applied based on the data contents. According to Infobright, an average compression ratio of 10:1 is achieved after loading data into Infobright (though many users see compression of 40:1 and more). At the same time the software creates metadata about the contents of each Data Pack as it is being loaded. This metadata is stored in the Knowledge Grid. The Knowledge Grid contains information about the contents of each Data Pack as well as the relationships between Data Packs, which are automatically created and stored. This includes a set of statistics and aggregate values of the data from each Data Pack, such as MIN, MAX, SUM, AVG, COUNT, and Number of NULLs. A further set of metadata describing ranges of numeric value occurrences and character positions, as well as column relationships between Data Packs, is also stored.

    As a query comes in, Infobright uses the infor-mation in the Knowledge Grid to determine which Data Packs are relevant to the query before decompressing any data. In many cases, the summary information already contained in the Knowledge Grid is sufficient to resolve the query, and nothing is decompressed. Working together, the Data Packs, Knowledge Grid and Infobrights iterative computing engine (Granular Computing Engine) should ensure fast, consistent query performance even when data volumes increase dramatically. Needless to say, the Knowledge Grid is automatically updated whenever the database is updated.

    Note that, thanks to the Knowledge Grid, Infobright does not require you to partition or index the data. This not only reduces admin-istration but it also prevents data skew, which is a performance problem for vendors using horizontal (row-based) partitioning and which forces re-balancing of the database.

    In addition, by eliminating the need to partition the data, Infobright delivers support for ad hoc queries, which are a foundational require-ment for investigative analytics. The reason for this is that if you partition (or shard) your data, you limit the way that you can access the data: if your query matches the way you have partitioned the data then your queries will perform wellbut if they dont then they wont. In other words, partitioning works best when you know in advance what queries you are going to ask: which is the antithesis of ad hoc and self-service query processes. By not needing to partition data, Infobright ensures a consistent level of query performance regard-less of the nature of the query (assuming equal complexity).

    In so far as loading is concerned this can run at up to TBs per hour within a multi-machine loader configuration with Infobright. Many customers use Infobright in a highly dynamic production environment where new data needs to be loaded and accessed within minutes for near-real-time analytics.

  • 7 2013 Bloor ResearchA Bloor White Paper

    Exploiting the Internet of Things with investigative analytics

    Conclusion

    As Kevin Ashton wrote, back in the last century, The Internet of Things has the potential to change the world, just as the Internet did. Maybe even more so. It has taken more than a decade but the Internet of Things is here. It isnt yet as widely implemented as it will be and it will take a while before its full impact is felt, both at a business level and in our daily lives, especially as it is exploited through the use of investigative analytics. But make no mistake: it is here and it is growing. From a business perspective this has very significant repercussions, with the addition of investigative analytics the Internet of Things will enable substantial steps forward in customer service in the present, and in business planning for the future. Like all major technology changes this combination of capabilities offers both opportunities and threats and there will be winners and losers. The winners will be those that grasp these new technologies and use them to enhance and expand their business.

    Further Information

    Further information about this subject is available from http://www.BloorResearch.com/update/2170

  • Bloor Research overview

    Bloor Research is one of Europes leading IT research, analysis and consultancy organisa-tions. We explain how to bring greater Agility to corporate IT systems through the effective governance, management and leverage of Information. We have built a reputation for telling the right story with independent, intelligent, well-articulated communications content and publications on all aspects of the ICT industry. We believe the objective of telling the right story is to:

    Describe the technology in context to its business value and the other systems and processes it interacts with.

    Understand how new and innovative tech-nologies fit in with existing ICT investments.

    Look at the whole market and explain all the solutions available and how they can be more effectively evaluated.

    Filter noise and make it easier to find the additional information or news that supports both investment and implementation.

    Ensure all our content is available through the most appropriate channel.

    Founded in 1989, we have spent over two decades distributing research and analysis to IT user and vendor organisations throughout the world via online subscriptions, tailored research services, events and consultancy projects. We are committed to turning our knowledge into business value for you.

    About the authorPhilip Howard Research Director - Data Management

    Philip started in the computer industry way back in 1973 and has variously worked as a systems analyst, programmer and salesperson, as well as in marketing and product management, for a variety of companies including GEC Marconi, GPT, Philips Data Systems, Raytheon and NCR.

    After a quarter of a century of not being his own boss Philip set up his own company in 1992 and his first client was Bloor Research (then ButlerBloor), with Philip working for the company as an associate analyst. His relationship with Bloor Research has continued since that time and he is now Research Director focused on Data Management.

    Data management refers to the management, movement, governance and storage of data and involves diverse technologies that include (but are not limited to) databases and data warehousing, data integration (including ETL, data migration and data federation), data quality, master data management, metadata management and log and event manage-ment. Philip also tracks spreadsheet management and complex event processing.

    In addition to the numerous reports Philip has written on behalf of Bloor Research, Philip also contributes regularly to IT-Director.com and IT-Analysis.com and was previously editor of both Application Development News and Operating System News on behalf of Cambridge Market Intel-ligence (CMI). He has also contributed to various magazines and written a number of reports published by companies such as CMI and The Financial Times. Philip speaks regularly at conferences and other events throughout Europe and North America.

    Away from work, Philips primary leisure activities are canal boats, skiing, playing Bridge (at which he is a Life Master), dining out and walking Benji the dog.

  • Copyright & disclaimer

    This document is copyright 2013 Bloor Research. No part of this publication may be reproduced by any method whatsoever without the prior consent of Bloor Research.

    Due to the nature of this material, numerous hardware and software products have been mentioned by name. In the majority, if not all, of the cases, these product names are claimed as trademarks by the compa-nies that manufacture the products. It is not Bloor Researchs intent to claim these names or trademarks as our own. Likewise, company logos, graphics or screen shots have been reproduced with the consent of the owner and are subject to that owners copyright.

    Whilst every care has been taken in the preparation of this document to ensure that the information is correct, the publishers cannot accept responsibility for any errors or omissions.

  • 2nd Floor, 145157 St John Street

    LONDON, EC1V 4PY, United Kingdom

    Tel: +44 (0)207 043 9750 Fax: +44 (0)207 043 9748

    Web: www.BloorResearch.com email: [email protected]

    IntroductionUse cases What is required InfobrightConclusionBloor Research overviewAbout the authorCopyright & disclaimer