Mineknowledge Magazine, Vol. I

5
1 The problem with data and why you do care By Anna Skountzou You are flooded with tons of data, day by day. Spreadsheets, reports, surveys, you name it. But you are neither the info junkie nor the number cruncher. You actually have no time to invest on it, while you know that you’re wasting precious insights and significant potential. Despair? No. The problem is simple. Too much information in spreadsheets or other formats, laying around your hard disk or in the cloud. Have you ever tried to imagine the amount of knowledge hidden inside? Or how such an input would make the difference for you and your company? We bet that, even if it has already crossed your mind, you have neither the time nor the right techniques to exploit this data. And, till now, you were forced to hire a connoisseur of data analysis, something that proved to be costly and time- consuming. Yes, that was the best case, typically you just did nothing. Well, till now. Our team not only relieves you of all these, but also extends what you used to consider as a statistical analysis. Put aside the fluffy terminology and start thinking of rules and patterns, all illustrated via expositive writing and visualization schemes, finally contributing the less biased and most valuable signals for your decisions. And keep in mind that latent knowledge presents both a hidden cost and a tremendous opportunity, but there is no need to be stressed about that anymore, you may now just mine your knowledge! You can visualize data mining as a process of searching for treasure buried in the sand or digging up rock to mine for gold - thus 'mining', but the tools we use do it in a truly systematic and efficient way. MINEKNOWLEDGE December 1, 2008

description

The first volume of the magazine of http://mineknowledge.com, a data mining services company

Transcript of Mineknowledge Magazine, Vol. I

Page 1: Mineknowledge Magazine, Vol. I

1

The problem with dataand why you do careBy Anna Skountzou

You are flooded with tons of data, day by day. Spreadsheets, reports, surveys, you name it. But you are neither the info junkie nor the number cruncher. You actually have no time to invest on it, while you know that you’re wasting precious insights and significant potential. Despair? No.

The problem is simple. Too much information in spreadsheets or other formats, laying around your hard disk or in the cloud. Have you ever tried to imagine the amount of

knowledge hidden inside? Or how such an input would make the difference for you and your company?

We bet that, even if it has already crossed your mind, you have neither the time nor the right techniques to exploit this data.  And, till now, you were forced to hire a connoisseur of data analysis, something that proved to be costly and time-consuming. Yes, that was the best case, typically you just did nothing.

Well, till now. Our team not only relieves you of all these, but also extends what you used to

consider as a statistical analysis. Put aside the fluffy terminology and start thinking of rules and patterns, all illustrated via expositive writing and visualization schemes, finally contributing the less biased and most valuable signals for your decisions.

And keep in mind that latent knowledge presents both a hidden cost and a tremendous opportunity, but there is no need to be stressed about that anymore, you may now just mine your knowledge!

You can visualize data mining as a process of searching for treasure buried in the sand or digging up rock to mine for gold - thus 'mining', but the tools we use do it in a truly systematic and efficient way.

MINEKNOWLEDGED

ecem

ber

1, 2

008

Page 2: Mineknowledge Magazine, Vol. I

2

Speaking of taking the most out of your data sets, we provide you with a rock solid solution. The process goes like this:

1. You send your data to us.

2. You sit back, breathe some fresh air and enjoy every moment of your life in between.

3. You open your inbox and receive the very knowledge and secrets trapped in your data, unveiled.

And here are a few more reasons on why to select us.

1. It is easy: Consultants, discussions, meetings. Forget them all. What you need to do is just send us an email, with your data set attached.2. It is fast: Within a week, results and the very knowledge of your data set will pop up in your inbox.3. It is fun: Honestly. Working with tons of data is so much fun, especially when others do all the work for you.4. It is secure: Rest assured that we’ve done our best to keep your data and mining results safe and private (the latter may not be valid in our free services).5. It is clear: If statistics sound greek to you, we’re speaking your language.

6. It is insightful: You already know that, we give you the most valuable insights on your data in return.7. It is affordable: The cost of a datamine.it analysis range, you either pay nothing, or €500.

“A miner with a mattock in his hand is a very rough way to conceptualize the complexity and state-of-the-art of the processes we execute. A diverse and extended set of exploration and filtering algorithms, next to a variety of learning and meta-learning techniques, are utilized, optimized and evaluated, while the problem is a computationally intensive one and demands a highly customized approach. So we’re putting human intelligence and our high expertise in between of various advanced artificial intelligence algorithms, to finally provide you with the very secrets trapped in your data, unveiled.”

George Tziralis

Our solutionwhy mineknowledge?By Manos Androulakis

MIN

EKNOWLE

DGE

Dec

emb

er 1

, 200

8

data stand as the least biased input to deci- sion making, a pure source of insights and knowledge.

Page 3: Mineknowledge Magazine, Vol. I

3

“Think of a simple process. Then, make it simpler. Try to find the steps that are still vague. Cut them off. Finally, ask your grandma what she cannot understand.This is how we make things happen in MIneKnowledge”

Iro Zacharidou

It’s simple. You just email us your data set, in an .xls, .csv or .txt format. And then we take over.

The file should in a form like the one in the figure. Let us make it even more clear.

Columns in the data set are attributes, like color, size, value and purchase decision; whatever your data set is made of. Attributes may be numeric (numbers), or nominal (one or more words). You also need to define the ‘target’ attribute, one or more characteristics you want us to focus on and explain its behavior based on all the other attributes.

You also may call each row an instance, a case, an example, in other words a discrete set of values of each attribute (let’s say a person’s reply to a survey, a product’s characteristics in a list of products, quarter updates in a

balance sheet, you name it). And there are no limitations in the number of columns and rows of your set.

Clear enough? Ok, that’s it. As long as you got the data set in that format, you just sent it to us with an email at go(AT)mineknowledge.com. We’ll reply with a confirmation of receiving an appropriate file, plus an invoice via paypal. And, within a week, you’ll have a fully fledged mineknowledge report, waiting in your inbox. We think it’s simple.

PricingThere are two pricing plans:

FREE: You can have the whole report for free, if your data set is of less than 30 columns (attributes) and 300 rows (instances), plus you agree that we may publish the analysis in our blog. In this case, the report may take up to a month.

MAX: No restrictions at all, delivery within a week, full fledged analysis for a €500 flat price.

You may take a look at a typical datamine.it results report in our website, while we do wait for your data sets!

The process:Hassle freeBy Eleftheria Kanavou

MIN

EKNOWLE

DGE

Dec

emb

er 1

, 200

8

Page 4: Mineknowledge Magazine, Vol. I

4

You used to think column graphs and pies as the most insightful views you could expect from a data analysis. You’ll probably change your mind. Let’s take a look at a survey example.

A simplistic one. Consider the data set described in the previous page.

Let’s say it refers to answers gathered through a survey, or stored in your enterprise database. A typical analysis will finally come up with some graphs, like the ones following.

And you’re probably used to consider the analysis contributing a graph like the above as, well, fruitful. Same for the following one.

But, the question remains. Is that the most you can expect from a data set analysis? Have you actually gain deep insights from your data? The answer is a clear no. Let us show you why.

What follows is a set of rules that emerge from a proper datamine.it analysis, even for a data set as oversimplified as the above example. Try this graph:

or, maybe this set of rules:

If color = yellow then buy = yes

If color = red then buy = yes

If color = white then buy = yes

If color = green then buy = no

If color = blue then buy = no

If color = black then buy = no

See the difference between the almost obvious and the really insightful?

Go find out more in a complete typical mineknowledge report. Yes, this is what your own data set will look like, just after a week. Still considering it? Check out our blog for more case studies. And send us your data, now.

An example:SurveysBy Athina Pandi

MIN

EKNOWLE

DGE

Dec

emb

er 1

, 200

8

Page 5: Mineknowledge Magazine, Vol. I

A few more wordsabout usBy Eirini LygkoniMineKnowledge is a group of young and passionate data engineers, each of us holding an engineering diploma from NTUA and an MSc or PhD in Applied Math, Statistics or Operations Research. We are located in Athens, Greece and London, UK.

• George Tziralis is clearly a data junkie who lives on his mac. In the rare case he’s logged off, he enjoys dancing tango and organizing Open Coffee meetings around Greece. Apart from that, at 26 he is a serial entrepreneur, while he also teaches a data mining post-graduate course via blog and tries to find some time to write up his PhD Thesis on markets for forecasting.

• Athina Pandi is -among datamine.it- on her second MSc at Imperial College. Communications & Signal Processing is her late interest, next to statistics, data mining and networks. When she is offline, you may find her in a pub around Hyde park.

• Eleftheria Kanavou, with a strong tendency in dancing, is the one who naturally gives rhythm to the whole team. Her research interests include stochastic processes and behavioral statistics, while data mining is the physical outlet of her entrepreneurial attitude. 

• Manos Androulakis is the algo geek of the team. The biggest the challenge and the data set, the most determined he is for the next diamond to mine. His expertise lies in the areas of statistical designs, variable selection methods and medical applications, under the prism of data mining of course.

• Eirini Lygkoni is the epitome of doing magic under pressure. A multi-tasker by nature, she is addicted to statistics and probabilities, while she literally can’t wait for the next data set to arrive. At the same time, simplicity is her favorite word and socializing her selection of choice for her rare free time. Enough said. 

• Lina Massou stands as the quiet power of the team. With a strong background in information theory and cryptography, she definitely is the one to take good care of your data and come up with their very knowledge, unveiled.

• Anna Skountzou excels at both statistics research and ecological conscience. That said, she’s definitely the one to look for, when you are looking at extracting patterns to let you put your data into much more efficient use, their green footprint included. 

• Iro Zacharidou is a true data nut, next to a party animal, putting her deep statistical expertise aside. If you wonder about the outcome of these coming together, rest assured that the required amount of persistence and professionalism to put your data into investigation will be largely outrun.

data is our passion and mining our joy We literally can’t wait to put our hands on your data; get ready to be impressed -or even excited- from the precious insights that you’ll receive in just a week.

MINEKNOWLEDGEAthens, Greece | London, [email protected]://www.mineknowledge.com

MIN

EKNOWLE

DGE

Dec

emb

er 1

, 200

8