1 Online data mining course Chapter 1: Introduction László Pitlik University Gödöllő, Institute...

14
1 Online data mining course Online data mining course Chapter 1: Introduction Chapter 1: Introduction László Pitlik University Gödöllő, Institute of Computer Sciences Gödöllő, H-2100 Páter K. u. 1. 2008.XII.04. Hello! My Name is Rob-But Ler, the Robot-Expert of My-X! My Boss is here as far as possible... Till then, you can replay his message:

Transcript of 1 Online data mining course Chapter 1: Introduction László Pitlik University Gödöllő, Institute...

Page 1: 1 Online data mining course Chapter 1: Introduction László Pitlik University Gödöllő, Institute of Computer Sciences Gödöllő, H-2100 Páter K. u. 1. 2008.XII.04.

1

Online data mining courseOnline data mining courseChapter 1: IntroductionChapter 1: Introduction

László PitlikUniversity Gödöllő,

Institute of Computer SciencesGödöllő, H-2100 Páter K. u. 1.

2008.XII.04.

Hello!

My Name is Rob-But Ler, the Robot-Expert of My-X!

My Boss is here as far as possible...

Till then, you can replay his message:

Page 2: 1 Online data mining course Chapter 1: Introduction László Pitlik University Gödöllő, Institute of Computer Sciences Gödöllő, H-2100 Páter K. u. 1. 2008.XII.04.

2

Greetings and introductions

• Welcome by the My-X project

• Apropos: „dress” rehearsal in frame of the best English courses of the world!

• Briefly about our symbols:

• and our keywords: sustainability, balance, equilibrium, consistency

• and finally about myself…

Online data mining course – Chapter 1: Introduction

Pitlik
As you know: the nearing end of the best English courses of the world delivers the apropos of this presentation today.Roxane asked me to present you an introduction about our online service tool.
Pitlik
Good morning everyone!It is a great pleasure for me, to welcome you here (in the 'dress' rehearsal of the online data mining course at the virtual university of My-X).My-X stands for MY=YOUR eXpertises. It may be worth noticing this abbreviation: namely it is a pioneer project and it might be called as one of the first web 3.0 service packages...
Pitlik
Before we start, really briefly about our symbols:Firstly: about our domain icon. It was created from a well-balanced stone-tower, which was rotated and inverted to each possible direction, in order to simulate an X (especially the my-X)-symbol...
Pitlik
Secondly: about the compass. It should signal the advising (guideposting) character of the services.Thirdly: about the scale. It should represent our intentions and keywords as sustainability, balance, equilibrium, consistency...
Pitlik
My Name is László Pitlik and I am working (even now) on the finishing of the My-X project.Each feature of this project to explain, it offers this online data mining course placing on the web-adress: "my dash x dot h u"...and now let us start with the outlining of the presentation...(next slide)
Page 3: 1 Online data mining course Chapter 1: Introduction László Pitlik University Gödöllő, Institute of Computer Sciences Gödöllő, H-2100 Páter K. u. 1. 2008.XII.04.

3

Outline of the presentation

• Aims of this course

• A test-question (in advance and after that too)• Further questions, to initialize the common thinking

• Theoretical background or near to the heresy?!• Didactical background (how to learn?)

• List of the course units (what to learn?)

• Summary and conclusions• One solution of the test-question

Online data mining course – Chapter 1: Introduction

Pitlik
As you all can see (reading the outline: as aims, initializations, backgrounds, course units, conclusions), by the end of this session you will know enough about the topics to decide, whether you can need the offered knowledge elements or not.
Pitlik
I have prepared a handout with the main points of my presentation. Please, take one each and pass them around...(folder)SORRY, I had wanted to give you a copy, but unfortunately I have forgotten the copies in my room...
Pitlik
So! That was an overall look at the main points of the presentation. Let us now turn to each issue step by step. At first: about the aims...(next slide)
Pitlik
If you had any questions while I talk, please, notice your keywords regarding to the slide title. At the end (I hope) we will have the necessary time to clarify each problem bringing up by you.
Page 4: 1 Online data mining course Chapter 1: Introduction László Pitlik University Gödöllő, Institute of Computer Sciences Gödöllő, H-2100 Páter K. u. 1. 2008.XII.04.

4

Aims of the course

On the basis of previous projects you can follow step by step,

• how to prepare (e.g. by pivot tables), and• how to manage (s. OLAP) the necessary project databases, • how to define similarity problems• including their controlling aspects and • how to make online and offline analyses and • how to interpret and • to describe the calculated results • (as preferred) in an online expert system.

By the end of the course you will know about each step for the successful managing of planning, decision making and forecasting.

Online data mining course – Chapter 1: Introduction

Pitlik
...slide text...
Pitlik
As you may notice, certain keywords have already been highlighted. The online service provides further informations starting from these jumping points...I now to move to the first test-question scanning your association potential...(next slide)
Page 5: 1 Online data mining course Chapter 1: Introduction László Pitlik University Gödöllő, Institute of Computer Sciences Gödöllő, H-2100 Páter K. u. 1. 2008.XII.04.

5

Test-Question (in advance)

Please, „match” the following words, fragments and letters (one letter can be used not only once), and write a short story or (it is more comfortable for you, than) some equations based on the explored antagonisms:

Science syn, sin, sisFusion con, the

C E I H T Y

Online data mining course – Chapter 1: Introduction

Pitlik
I am sure, you have already seen, that we are able to handle this test-question only in English. Are not we?If only somebody had an appropriate solution for other languages, I would be really happy..Or can we have found any one until our next meeting?
Pitlik
At the end of this session we can talk about this English pun...In my experiences: to solve this problem, you may need further supports. To deliver them, there was prepared 2 slides:
Pitlik
...text slide...
Pitlik
At first: about the initializing the common thinking and after that one slide about the theoretical backgrounds (partially explaining the solution of this seemingly d-evil question...)(next slide)
Page 6: 1 Online data mining course Chapter 1: Introduction László Pitlik University Gödöllő, Institute of Computer Sciences Gödöllő, H-2100 Páter K. u. 1. 2008.XII.04.

6

Initializing the common thinking

• Do you know, whether a prediction should be in general better for the shorter term or for the longer term?

If possible: Vote ratio by the audience

• Do you know whether an analysis based on more data records should be more correct?

If possible: Vote ratio by the audience

• Do you know whether an analysis testing through large amount of cases should be more fit than some other one without testing?

If possible: Vote ratio by the audience

Online data mining course – Chapter 1: Introduction

Pitlik
Well, we have now to clarify how far we think the same about the core problems in the science?...slide text...
Pitlik
The most people would have answered: of course- the longer term should have more uncertainty,- the more data the more model-fitting- without testing we can not build robust models...
Pitlik
This and still more - can we clarify and proof in a discursive way in the course...Here without a detailed discussion I would like to change direction and talk about the theoretical backgrounds of data mining...Let us go on to the most important theses of this session...(next slide)
Pitlik
HOWEVER - each of these answers is not correct! WHY - do you think - should not we accept these opinions?
Page 7: 1 Online data mining course Chapter 1: Introduction László Pitlik University Gödöllő, Institute of Computer Sciences Gödöllő, H-2100 Páter K. u. 1. 2008.XII.04.

7

Theoretical backgrounds OR near to the heresy?

• A phenomenon can only be labeled SCIENCE in case it can be transformed into program-codes (e.g. chess-robot).

• Each other phenomenon belongs to artistic performance (e.g. studies, lectures and always this presentation).

• The human intuition brings the good ideas. But not only human intuition seems to exist (cf. K. Lorenz, 1942).

• All living creatures on the earth have sensors to measure their (inside and outside) environment.

• The measured values are continuously interpreted in order to find some connection between causes and reactions.

• “Heureka”! – was already cried directly at the beginning of life!• Data mining has to deliver possible connections based on

the measured records.• Therefore we can press our instinctive capability into source

codes.

Online data mining course – Chapter 1: Introduction

Pitlik
You would surely discuss here: Why should be said: Heresy (or heretical thinking)?The post-modern scientific canon talk often about diversity on interpretations, but seldom talk about the necessity of synthesis, meta-philosophy, consistency, control, competition, automation - especially in the field of economics and social sciences...
Pitlik
In order to reach the critical mass in heresy, we need to change the education strategy and didactic priorities...(next slide)
Pitlik
In short, we need to read together these sentences, in order to prepare at once a necessary level of brain-washing...
Pitlik
...slide text...
Page 8: 1 Online data mining course Chapter 1: Introduction László Pitlik University Gödöllő, Institute of Computer Sciences Gödöllő, H-2100 Páter K. u. 1. 2008.XII.04.

8

Didactical background (how to learn)Sustainable education:• Nothing irrelevant to store• Strategic planning: consistency-based• Operative thinking: market-oriented

Priorities or core knowledge elements:• Efficiency through real time analyses• Case-Based Reasoning (CBR) logic as core method

• Most universal (benchmarking, forecasting / offline, online)• Most adaptable (free to set parameters, no programming)

• Competition of methods and searching strategies• Decisions trees• Artificial neural networks• Monte-Carlo Methods (MCM) and genetic algorithms

Online data mining course – Chapter 1: Introduction

Pitlik
This course was created in order to approximate the vision of a sustainable university. The students at the University of the Future have to work! under leading of the teachers in such projects, which was defined by the potential employers.
Pitlik
The right performance should be rewarded as the same one on the market. The students can deliver also real problems. In this case, the marketing topics are included in the interdisciplinary education. The realized income ensures two pillars for the sustainability: Firstly, the students should learn hardly anything irrelevant for them: (see the next graph)
Pitlik
In order to reach a real time speed for analysis (but without black box analyzing tools), the students have to learn the most robust CBR-logic in the first phase.
Pitlik
Case-based reasoning can also be used for calculating predictions, for building of explaining or simulation and making benchmarking (e.g. price-performance analysis). A CBR-algorithm can be defined in an offline way as an MS_Excel-solution (basing on its solver module), but even as an online solution (e.g. LPS). LPS stands for Linear Programming System...
Pitlik
Decision trees, artificial neural networks (as function types), MCM and genetic algorithms (as search strategies) will be involved at the end of this course.
Pitlik
So! On the next slide you can see the structure of the course.The following course units provide a holistic capability to solve problemsalone and online......next slide...
Pitlik
From that matter: CBR stands for case-based reasoning.
Pitlik
Secondly we should know, when the education do work sustainable (from an economic point of view), namely:if the operative course units are chosen market-oriented, (and the control of the quality in the education process is based on consistency criteria).
Pitlik
As well as sustainability aspects, the virtual university of the future has following priorities:
Page 9: 1 Online data mining course Chapter 1: Introduction László Pitlik University Gödöllő, Institute of Computer Sciences Gödöllő, H-2100 Páter K. u. 1. 2008.XII.04.

Learning strategies and maintance

0

20

40

60

80

100

120

0 50 100 150 200 250 300

time (day)

know

ledg

e le

vel (

%)

storage

usage

Learning strategies and their maintenance

(source: own calculations)

Pitlik
In order to explain this graph, I should present a seemingly personal story:A a regular student, I have always learned quite each items (literally perfect).
Pitlik
Therefore I have reached really fast a high level of knowledge (a knowledge - used hardly never).If I had liked to maintenance this level, I would have repeated each knowledge element continuously.
Pitlik
This graph shows the two possibilities of learning strategies.The vertical axis represents the knowledge level as per cent.The horizontal axis shows the time in days.
Pitlik
The blue line represents the strategy (called learning for storage) and its integral (the surface under the line) means the necessary resources to maintenance.
Pitlik
The other line shows the same for the strategy (called usage based learning). It means: The learning should be not for itself.ORLearning and using may be quite different.
Pitlik
Here I would change direction and go back to the features of the sustainable education.At first I said about the most important options of the learning strategies.(back)
Page 10: 1 Online data mining course Chapter 1: Introduction László Pitlik University Gödöllő, Institute of Computer Sciences Gödöllő, H-2100 Páter K. u. 1. 2008.XII.04.

10

List of the course units (what to learn)

• The world can be interpreted in form of Object-Attribute-Matrixes (OAM)!

• Anomalies of the data assets management (Why is the preparation of an OAM so slow? How to avoid the anomalies?)

• Preparing OLAP (online analytical processing) databases (do it yourself, if nobody wants to make it)

• Using OLAP-techniques for OAM (efficiency as the highest priority)• How it is made: Expert system (rules as universal solution)• CBR-pattern (OAM from time series, or in benchmarking, or for

production functions)• Solver (be free offline)• COCO (component based object comparison // be free online)• Interpretations of results (chess-robots for context free situations)• Standard expectations of studies (What you may not do and what

have to do for a good study?)

Online data mining course – Chapter 1: Introduction

Pitlik
The good news is on the one hand that you will be able to use a lot of these competencies(as OAM, data assets management, OLAP, expert systems, cbr, solver, coco, etc.) in diverse situations quite simple...On the other hand: these units are very easy to learn. BUT:
Pitlik
Unfortunately, to combine them useful and efficient, it will be more complicated...
Pitlik
Let me summarize the main aspects, before we talk about the theoretical fundamentals.(next slide)
Page 11: 1 Online data mining course Chapter 1: Introduction László Pitlik University Gödöllő, Institute of Computer Sciences Gödöllő, H-2100 Páter K. u. 1. 2008.XII.04.

11

Summary

• We have defined strategic and operative aims (deriving from real problems)…

• We have checked, whether we see the same world around us…

• We would like to teach and learn only the most necessary competencies…

• We have seen in brief, which competencies we should combine in order to approximate a real time speed in the analysis…

Online data mining course – Chapter 1: Introduction

Pitlik
As well as changes in our head, we need to initialize changes in our environment:(next slide)
Pitlik
...slide text...
Page 12: 1 Online data mining course Chapter 1: Introduction László Pitlik University Gödöllő, Institute of Computer Sciences Gödöllő, H-2100 Páter K. u. 1. 2008.XII.04.

12

Conclusions

• We have data, methods, computers, networks, problems and unfortunately illogical restrictions in our General Problem Solving (GPS) strategies

• We have an icon: namely the chess-robot… * * * therefore * * *

• We should ensure the free access to each datum!• We should learn from own instincts!• We have to transform the intuitions into source codes!• We have to provide the new methods also online!• We have to teach the people to think instead to serve!• We can detect the lacks of equilibrium!• We can correct always the wrong directions!

LET US DO

THEM!

Pitlik
Do not we?
Pitlik
Shall we?
Page 13: 1 Online data mining course Chapter 1: Introduction László Pitlik University Gödöllő, Institute of Computer Sciences Gödöllő, H-2100 Páter K. u. 1. 2008.XII.04.

13

Thank you very much for your attention!

Further details:

[email protected]

http://miau.gau.hu/myx-free

Pitlik
And now - if you have any questions, please do not hesitate longer to ask me...
Page 14: 1 Online data mining course Chapter 1: Introduction László Pitlik University Gödöllő, Institute of Computer Sciences Gödöllő, H-2100 Páter K. u. 1. 2008.XII.04.

14

Pun?!pros and cons

Science = Con-science (=TQM or

con-sis-TENCY in thinking)

Syn-the-sis = Fusion (of each thesis)Confusion = Sin - the-sisSin <> ETHIC Syn-the-TIC => Artificial (Intelligence) => Robotics

http://en.wiktionary.org/wiki/conscience (incl. Etymology aspects)