Department of Computer Science 2015 Research Areas and Projects 1.Data Mining and Machine Learning...
-
Upload
cameron-leonard -
Category
Documents
-
view
214 -
download
0
Transcript of Department of Computer Science 2015 Research Areas and Projects 1.Data Mining and Machine Learning...
Department of Computer Science
2015 Research Areas and Projects1.Data Mining and Machine Learning Group (UH-
DMML) Its research is focusing on:
1. Spatial Data Mining 2. Clustering and Anomaly Detection 3. Classification and Prediction 4. GIS
2. Current and Planned Projects5. Clustering Algorithms with Plug-in Fitness Functions and Other Non-
Traditional Clustering Approaches6. Analyzing and Doing Useful Things with Bio-aerosol Data quite new
7. Using Mixture Models for Anomaly Detection and Change Analysis quite new
8. Interestingness Scoping Algorithms for the Analysis of Spatial and Spatio-temporal Datasets
9. Taxonomy Generation—Learning Class Hierarchies from Training Data 10.Understanding, Preventing, and Recovery from Flooding just starting
11.Educational Data Mining (lead by Nouhad Rizk)UH-DMML
Department of Computer Science
1. Non-Traditional Clustering Algorithms
UH-DMML
Clustering Algorithms With plug-in Fitness Functions
MiningSpatio-Temporal
Datasets
Parallel ComputingPrototype-based
Clustering
AgglomerativeClustering Algorithms
Clustering Polygons andTrajectories
Illustration of MOSAIC’s approach
Input Output
MOSAIC STAXAC
CLEVER
AVALANCHE
Department of Computer Science
2. Understanding and Doing Useful Things with Bio-areosol Data
Definition: A bioaerosol (short for biological aerosol) is a suspension of airborne particles that contain living organisms or were released from living organisms.[1] These particles are very small and range in size from less than one micrometer (0.00004") to one hundred micrometers (0.004").
Research Questions Characterization of the Bio-aerosol Composition at a Particular Location Anomaly Detection and Change Analysis for Bio-aerosols Understanding Disease Spread Sensor-based Bio-aerosol Early Warning Systems …
UH-DMML[1] Wathes, Christopher M.; Cox, C. Barry (1995). Bioaerosols handbook. Chelsea, Mich: Lewis Publishers. ISBN 0-87371-615-9.
Department of Computer Science
3. Using Mixture Models for Anomaly Detection and Change Analysis
The Sensor Modeling Toolbox will be used for the following tasks: Change analysis and anomaly detection (based on sensor readings) For creating background models of particular sensors at particular
locations Development of sophisticated threat assessment functions that
operate on the top of the toolbox
Sensor Modeling Toolbox
Analysis Function1
. . .
Set of Sensor Reading
Model Fitting
Probabilistic Model
Analysis Function2
Analysis Functionk
UH-DMML
Department of Computer Science
Gaussian Mixture Models
Uses a parametric probability density function represented as a weighted sum of Gaussian component densities p(x) = * N(x|µk, ∑k) = Prior probabilities / weights of each component Gaussian.µk = Mean of kth Gaussian.∑k = Covariance Matrix of kth Gaussian.
x = Data point under consideration.N(x|µk, ∑k) = Density of x in kth Gaussian
= exp
K = Total number of Gaussian Components.
Data Set
EM
BIC/Akaike/…
Model Selection
Department of Computer Science
4. Interestingness Hotspot Discovery Framework for Grids
Objective: Find interesting hotspots in 4D grid-based datasets using plugin interestingness functions.
Methodology: Find hotspots in grid-based spatio-temporal datasets using hotspot discovery
algorithms and clustering techniques. Employ plugin interestingness and reward functions to guide the search for “good” hotspots.
Generate cluster summaries Visualize 4-dimensional spatio-temporal clusters and cluster summaries
Dataset: We are working on a 4-dimensional grid-based air pollution dataset. Each grid cell overs a 4x4 km area. There are 150,000 4D grid cells. Grid cells have latitude, longitude, layer (altitude), and time dimensions. Each grid cell is associated with hourly observations of 132 compounds in the air.
UH-DMMLLow variation hotspots
Department of Computer Science
Interestingness Hotspot Discovery Framework for Grids
Problem: Find 4D contiguous regions maximizing a plugin reward function:Reward(R) = interestingness(R) x size(R)b whereInterestingness(R) = Where 0 < th < 1 is the reward threshold, is the correlation of the 2 variables in the region R. Currently we are using Ozone and PM2.5 levels as variables.
UH-DMML
Ozone PM2.5
<-Highly Correlated region->
Ozone concentration in the region PM2.5 concentration in the region
Department of Computer Science
5. Taxonomy GenerationM
Taxonomy Generation Algorithm
Datasets
UH-DMML
Department of Computer Science
6. Understanding, Preventing, and Recovery from Flooding
UH-DMML
UH CeSAR Symp. 7/24/2015
Center for Sustainability and Resiliency
Department of Computer Science
Helping Scientists to Make Sense Out of their Data
Figure 1: Co-location regions involving deep andshallow ice on Mars
Figure 2: Interestingness hotspots where both income and CTR are high.
Figure 3: Mining hurricane trajectories
UH-DMML
Department of Computer Science
Some UH-DMML Graduates 1
Christoph F. Eick
Dr. Wei Ding, Associate Professor, Department of Computer Science,
University of Massachusetts, Boston
Sharon M. Tuttle, Professor,Department of Computer Science,
Humboldt State University, Arcata, California
Christopher T. Ryu, Professor, Department of Computer Science,
California State University, Fullerton
Sujing Wang, Assistant Professor,Department of Computer Science,
Lamar University, Beaumont, Texas
Department of Computer Science
Some UH-DMML Graduates 2
Christoph F. Eick
Chun-sheng Chen, PhD Amazon
Chong Wang, MS Haliburton
Justin Thomas MS Section Supervisor at Johns Hopkins University Applied Physics Laboratory
Mei-kang Wu MS Microsoft, Bellevue, Washington
Jing Wang MS AOL, California
Rachsuda Jiamthapthaksin PhD Faculty, Assumption University, Bangkok, Thailand
Department of Computer Science
Students in the UH-DMML Research Group
UH-DMML
PhD Students: Yongli Zhang, Fatih Akdag, Nguyen Pham, Chong Wang and Paul Amalaman.
Master Students: Puja Anchlia, Riny Hutapea and Rohit Jidagam.
Undergraduate Students: none at the moment