TIN2010-20900-C04-04 UPM Group

Team Objectives Tasks Results Collab

TIN2010-20900-C04-04UPM GROUP

Concha Bielza

Computational Intelligence GroupDepartamento de Inteligencia Artificial

Universidad Politecnica de Madridhttp://cig.fi.upm.es

Granada, May 12, 2011

C. Bielza UPM-Madrid

Outline

1 Team

2 Objectives

3 Tasks and commitments

4 Results

5 Collaborations within the project

Outline

1 Team

2 Objectives

4 Results

12 Members and 10 EDPs

2 Full ProfessorsConcha BielzaPedro Larranaga

2 foreign collaborators, Full ProfessorsTom Heskes (Nijmegen, The Netherlands)Qingfu Zhang (Essex, UK)

1 Supply Associate Professor: Juan A. Fernandez del Pozo

2 PostDoc ResearchersRuben Armananzas (Juan de la Cierva researcher)Roberto Santana (Cajal Blue Brain Project)

5 PhD StudentsHanen Borchani (FPI last TIN)Alfonso Ibanez (Consolider)Hossein Karshenas (Consolider)Pedro L. Lopez-Cruz (FPU)Diego Vidaurre (Cajal Blue Brain Project)

Outline

1 Team

2 Objectives

4 Results

Objectives

1. Joint probability distribution function learningDefinition of new scores and structural algorithms for learning PGMs.Ideas based on self-similarity, regularization, multicriteria, interaction, andwith complex data (noisy, missing, high-dimensional)Learning the parameters (densities) in models with continuous variables

2. Supervised classificationBNs classifiers in problems with an imbalanced classAdvance in the design of well-known BN classifiers (TAN, KDB, AODE,HAODE, WAODE, FBC, multinets...)Development of new methods for multi-dimensional classificationExtensions to massive data sets and data streamsDevelopment of new methods to convert a problem of classification intoregression modelsExtension of PGMs to hybrid domains (discrete and continuous variables)for its application to classification and regressionExtension to credal classifiers (use imprecise probabilities)Algorithms for learning utility-based classifiers

Objectives

1. Joint probability distribution function learningDefinition of new scores and structural algorithms for learning PGMs.Ideas based on self-similarity, regularization, multicriteria, interaction, andwith complex data (noisy, missing, high-dimensional)Learning the parameters (densities) in models with continuous variables

2. Supervised classificationBNs classifiers in problems with an imbalanced classAdvance in the design of well-known BN classifiers (TAN, KDB, AODE,HAODE, WAODE, FBC, multinets...)Development of new methods for multi-dimensional classificationExtensions to massive data sets and data streamsDevelopment of new methods to convert a problem of classification intoregression modelsExtension of PGMs to hybrid domains (discrete and continuous variables)for its application to classification and regressionExtension to credal classifiers (use imprecise probabilities)Algorithms for learning utility-based classifiers

Objectives

3. InferenceApproximate algorithms for MTE hybrid networks, credal networks,probabilistic decision graphs, precise and imprecise influence diagramsand for BNS using fast factorization and recursive treesAlgorithms based on query importance sampling for hybrid Bayesiannetworks

4. ApplicationsTechnological applications: evolutionary computation, mobile robotics,requirements tracing and classificationLife sciences: biomedicine, agriculture, environment, genomicsSocial domains: bibliometry, prediction of arrival times of city buses,detection of credit card frauds

Objectives

3. InferenceApproximate algorithms for MTE hybrid networks, credal networks,probabilistic decision graphs, precise and imprecise influence diagramsand for BNS using fast factorization and recursive treesAlgorithms based on query importance sampling for hybrid Bayesiannetworks

4. ApplicationsTechnological applications: evolutionary computation, mobile robotics,requirements tracing and classificationLife sciences: biomedicine, agriculture, environment, genomicsSocial domains: bibliometry, prediction of arrival times of city buses,detection of credit card frauds

Outline

1 Team

2 Objectives

4 Results

Tasks as PI

JPD LEARNING1.1 Scores for PGM learning

1.1.1 New scores, multi-objective and LP-regularization-based→ Learning BNs based on Lp-regularized scores, both in the space of

DAGs and of equivalence classes–already for GBNs [Vidaurre et al., 2010]

→ Learn structures by using multiobjective scores

1.2 New structure learning algorithms

1.2.1 Algorithms based on self-similarity→ Define a BN that admits the self-similarity property (and in 3D)→ New learning algorithms from data (score+search) and even new

simulation methods

Tasks as PI

JPD LEARNING1.1 Scores for PGM learning

1.1.1 New scores, multi-objective and LP-regularization-based→ Learning BNs based on Lp-regularized scores, both in the space of

DAGs and of equivalence classes–already for GBNs [Vidaurre et al., 2010]

→ Learn structures by using multiobjective scores

1.2 New structure learning algorithms

1.2.1 Algorithms based on self-similarity→ Define a BN that admits the self-similarity property (and in 3D)→ New learning algorithms from data (score+search) and even new

simulation methods

Schedule

yellow=PI; grey=collaboration

Tasks as PISUPERVISED CLASSIFICATION

2.3 Multi-dimensional classification with PGMs→ Learn general multi-dimensional Bayesian network classifiers from data→ New type of models decomposable (max connected components) to

alleviate computational burden of MPE computation→ Adapt the ideas of random forests to this context using tree-tree or

polytree-polytree→ Develop a stratified CV scheme in this context→ Missing data→ MPE provided by the consensus of partial MPEs of small components→ Extension to logistic regression and to regression with multiple outputs

2.4 Extensions to large data sets→ Massive databases: transform the classification into a regression and

analyze the data set as (smaller) blocks→ Data streams + some unlabeled observations: adapt the EM algorithm→ Data streams in multidimensional classification problems

2.5 Relationship between classification and regression→ Convert CPTs of a BN into logistic models (parametric and parsimonious),

beyond BN classifiers with perfect independence graphs→ Locally weighted regression to solve highly nonlinear and sparse problems→ Use regularization to help in FSS and then use the regression to solve classification problems

Schedule

Tasks as PIAPPLICATIONS

4.1 Technological applications

4.1.1 Evolutionary computation→ Parameter control in evolutionary computation, with parameter

values changing during the run→ Regularization in EDAs, with many generations but with learning and

simulation steps based on reduced populations

4.2 Life Sciences

4.2.1 Applications to Biomedicine→ Predict how HIV mutations influence resistance of many HIV drugs

(Hospital Carlos III, INAOE) –Task 2.3→ Neuroinformatics: Modelling and simulation of dendritic morphology

–Task 1.2.1→ Neuroinformatics: Discrimination between Alzheimer’s disease

patients and controls based on microarray data

4.3 Social domains

4.3.1 Applications to bibliometry→ Relationships, redundancies and properties of different indices that

measure the scientific productivity of a researcher→ Predict how these indices evolve in time→ Real data from Spanish researchers in CCIA, LSI and ATC areas

collected to have a picture of the Spanish productivityC. Bielza UPM-Madrid

4.2 Life Sciences

4.3 Social domains

4.2 Life Sciences

4.3 Social domains

Schedule

Commitments

Publish, per group and year, ≥ 4 JCR papers + 5 papers in conferenceproceedings

Participation in conferences, with communications, tutorials and as organizers

Apply for patents (e.g. dendritic morphology and classification problems forAlzheimer’s disease and HIV drug resistance)

Outline

1 Team

2 Objectives

4 Results

Publications

JCR (with acknowledgments to this project)

Bielza, C., Li, G., Larranaga, P. (2011) Multi-dimensional classification withBayesian networks, International Journal of Approximate Reasoning 52, 705-727

Garcıa-Torres, M., Armananzas, R., Bielza, C., Larranaga, P. (2011) Comparisonof metaheuristic strategies for peakbin selection in proteomic mass spectrometrydata, Information Sciences

Armananzas, R., Saeys, Y., Inza, I., Garcıa-Torres, M., Bielza, C., van de Peer,Y., Larranaga, P. (2011) Peakbin selection in mass spectrometry data using aconsensus approach with estimation of distribution algorithms, IEEE/ACMTransactions on Computational Biology and Bioinformatics 8, 760-774

Lopez-Cruz, P., Bielza, C., Larranaga, P., Benavides-Piccione, R., DeFelipe, J.(2011) Models and simulation of 3D neuronal dendritic trees using Bayesiannetworks, Neuroinformatics

Ibanez, A., Bielza, C., Larranaga, P. (2011) Using Bayesian networks to discoverrelationships between bibliometric indices. A case study of computer scienceand artificial intelligence journals, Scientometrics

PublicationsConferences (with acknowledgments to this project)

Karshenas, H., Santana, R., Bielza, C., Larranaga, P. (2011) Multi-objectiveoptimization with joint probabilistic modeling of objectives and variables,Evolutionary Multi-Criterion Optimization (EMO-2011), Lecture Notes inComputer Science, 6576, 298-312, Springer

Zaragoza, J., Sucar, E., Morales, E., Bielza, C., Larranaga, P. (2011) Bayesianchain classifiers for multidimensional classification, IJCAI-2011

Santana, R., Bielza, C., Larranaga, P (2011) Affinity propagation enhanced byestimation of distribution algorithms, Proceedings of the 2011 Genetic andEvolutionary Conference (GECCO-2011)

Santana, R., Karshenas, H., Bielza, C., Larranaga, P (2011) Quantitativegenetics in multi-objective optimization algorithms: From useful insights toeffective methods, Proceedings of the 2011 Genetic and EvolutionaryConference (GECCO-2011)

Santana, R., Karshenas, H., Bielza, C., Larraaga, P (2011) Regularized k-orderMarkov Models in EDAs, Proceedings of the 2011 Genetic and EvolutionaryConference (GECCO-2011)

Borchani, H., Bielza, C., Larranaga, P. (2011) Learning multi-dimensionalBayesian network classifiers using Markov blankets: A case study in theprediction of HIV protease inhibitors, Probabilistic Problem Solving inBioMedicine (ProBioMed’11) at AIME

Other activities

Tutorial at CAEPIA-2011 (Nov’11): “Aprendizaje Automatico y Optimizacion enNeurociencia” (Bielza, Larranaga)

2nd position at MEG mind reading challenge within International Conference onArtificial Neural Networks (Santana, Bielza, Larranaga)

Near future

Accepted stays for European thesis

D. Vidaurre at Nijmegen with T. Heskes (2012)

H. Borchani at Utrecht with L. van der Gaag (2012)

Proposals: Projects and contracts

European Project FET Flagship Initiative Preparatory Actions [granted:FP7-ICT-2011-FET-F-284941] during 2011

European project (Subprogramme ERASMUS within the Lifelong LearningProgramme) for 1 year: “Towards a rational policy decision making on Erasmusmobility based on intelligent data analysis”

National Network Atica on Applied Computational Intelligence

National project (CDTI) with Incita on multimodal biometry

National project (Avanza-Innpacto) with Apara and Fundacion CIEN onParkinson’s disease

Near future

Accepted stays for European thesis

D. Vidaurre at Nijmegen with T. Heskes (2012)

H. Borchani at Utrecht with L. van der Gaag (2012)

Proposals: Projects and contracts

European Project FET Flagship Initiative Preparatory Actions [granted:FP7-ICT-2011-FET-F-284941] during 2011

European project (Subprogramme ERASMUS within the Lifelong LearningProgramme) for 1 year: “Towards a rational policy decision making on Erasmusmobility based on intelligent data analysis”

National Network Atica on Applied Computational Intelligence

National project (CDTI) with Incita on multimodal biometry

National project (Avanza-Innpacto) with Apara and Fundacion CIEN onParkinson’s disease

Near future

Stays at UPM

Barbara Pieters (Utrecht U) Nov’10

Kangil Kim (National U of Seoul) Feb-Aug’11

Ferran Reverter (UB) Oct-Nov’11

Collaborations

Maestu-Nevado at CTB (UPM) in neuroscience

Hospital de la Santa Creu i Sant Pau (Barcelona) in neuroscience

Instituto Cajal and Columbia U. in neuroscience

P.Rudomin-UlisesCortes-ErikaRodrıguez (Mexico-UPC) in neuroscience

G. Ascoli (George Mason U) in neuroscience

CIEMAT in bioinformatics

Hospital Carlos III in HIV

Outline

1 Team

2 Objectives

4 Results

4 groups collaborating

Specializations

Granada: imprecise probabilities and decision making, approx. inferencealgorithms

Almerıa: continuous variables, approx. inference algorithms

Albacete: abductive inference, machine learning algorithms and metaheuristics

Madrid: machine learning algorithms, decision making and evolutionarycomputation

Expected collaborations

Granada-Madrid: defining scores for structural learning, learning with impreciseprobabilities and modelling with influence diagrams for decision making

Almerıa-Albacete: learning hybrid networks

Granada-Almerıa: decision making

Albacete-Madrid: multi-dimensional classification where the class vectorassignment is performed by means of abductive inference

4 groups collaborating

Joint expected activities

Mobility and exchange of researchers

Shared supervision of PhD theses and research works (D.E.A.)

4 workshops, one in each city (work carried out, next plans, difficulties found):months 5, 14, 23 and 31 approx.

4 project supervisors will celebrate intermediate meetings, at least one persemester

A server to make papers, software and documentation generated accessible toall the participants

Inclusion of procedures in some open software tool (e.g. Elvira, WEKA, Mateda,ProGraMo) or by making available to the scientific community some specificroutines

Applics results extended to conferences and journals non-specific of PGMs; useof results/software by EPOs. Ours are: Atos, Instituto Cajal, Panda Security

Keywords per member

T. Heskes→ regularization and neurocomputing

Q. Zhang→ evolutionary computation

J.A. Fernandez del Pozo→ decision analysis, evolutionary computation

R. Armananzas→ evolutionary computation, classification, bioinformatics

R. Santana→ evolutionary computation, neuroscience

H. Borchani→ multi-dim classification, semi-supervised, data streams, missingdata

A. Ibanez→ classification, bibliometry

H. Karshenas→ evolutionary computation

P.L. Lopez-Cruz→ new Bayesian classifiers, neuroscience

D. Vidaurre→ regularization, continuous variables, neuroscience

Proposals to collaborate

With Granada

Advances in known Bayesian networks classifiers: PGMs for hybrid domains inclassification (MOP)

Learning IDs from data

With Albacete

Abductive inference for multi-dimensional classification (constraints on thevalues of the class vector)

With Almerıa

Learning MOPs from data

TIN2010-20900-C04-04UPM GROUP

Concha Bielza

Computational Intelligence GroupDepartamento de Inteligencia Artificial

Universidad Politecnica de Madridhttp://cig.fi.upm.es

Granada, May 12, 2011

TIN2010-20900-C04-04 UPM Group

Documents

Transcript of TIN2010-20900-C04-04 UPM Group