Big Data Analytics for - uni-jena.de · project and its popularity as a big data processing engine...
Transcript of Big Data Analytics for - uni-jena.de · project and its popularity as a big data processing engine...
![Page 1: Big Data Analytics for - uni-jena.de · project and its popularity as a big data processing engine has taken off. • It is a much simpler to install and use this implementation of](https://reader034.fdocuments.in/reader034/viewer/2022042413/5f2dc9ca7c274957f42561b9/html5/thumbnails/1.jpg)
Prof. Dr. Taysir Hassan A. Soliman Vice Dean for Graduate Studies & Research Faculty of Computers & Information, Assiut University Assiut University BioDialog PI Nov. 16, 2016
Big Data Analytics for BioDiversity
![Page 2: Big Data Analytics for - uni-jena.de · project and its popularity as a big data processing engine has taken off. • It is a much simpler to install and use this implementation of](https://reader034.fdocuments.in/reader034/viewer/2022042413/5f2dc9ca7c274957f42561b9/html5/thumbnails/2.jpg)
Outline
• About Assiut • Assiut University • Faculty of Computers & Information • Research Interests • Biodiversity Informatics Previous Activities at
Assiut University • Visits and examples of Biodiversity in Egypt • Big data research and bidiversity
2
![Page 3: Big Data Analytics for - uni-jena.de · project and its popularity as a big data processing engine has taken off. • It is a much simpler to install and use this implementation of](https://reader034.fdocuments.in/reader034/viewer/2022042413/5f2dc9ca7c274957f42561b9/html5/thumbnails/3.jpg)
3
![Page 4: Big Data Analytics for - uni-jena.de · project and its popularity as a big data processing engine has taken off. • It is a much simpler to install and use this implementation of](https://reader034.fdocuments.in/reader034/viewer/2022042413/5f2dc9ca7c274957f42561b9/html5/thumbnails/4.jpg)
Assiut
4
Gamal Abdel Nasser, Second President of Egypt Jalal al-Din al-Khudayri al-Suyuti: Egyptian religious scholar Hafez Ibrahim, poet Amin Mohsen, Diplomat Mustafa Lutfi al-Manfaluti, writer and poet Pope Shenouda III of Alexandria, Pope of the Coptic Orthodox Church
Jalal El Din El Suyuti
![Page 5: Big Data Analytics for - uni-jena.de · project and its popularity as a big data processing engine has taken off. • It is a much simpler to install and use this implementation of](https://reader034.fdocuments.in/reader034/viewer/2022042413/5f2dc9ca7c274957f42561b9/html5/thumbnails/5.jpg)
A Few Pictures From Assiut City
5
The Dam
A Walk beside the Nile
Assiut University Entrance
The Nile
![Page 6: Big Data Analytics for - uni-jena.de · project and its popularity as a big data processing engine has taken off. • It is a much simpler to install and use this implementation of](https://reader034.fdocuments.in/reader034/viewer/2022042413/5f2dc9ca7c274957f42561b9/html5/thumbnails/6.jpg)
Assiut University Map
6
![Page 7: Big Data Analytics for - uni-jena.de · project and its popularity as a big data processing engine has taken off. • It is a much simpler to install and use this implementation of](https://reader034.fdocuments.in/reader034/viewer/2022042413/5f2dc9ca7c274957f42561b9/html5/thumbnails/7.jpg)
Assiut University
• Assiut University was established in October 1957 as the first university in Upper Egypt to prepare highly qualified graduates with the basic specialized academic knowledge and training expertise on the various necessary skills.
7
![Page 8: Big Data Analytics for - uni-jena.de · project and its popularity as a big data processing engine has taken off. • It is a much simpler to install and use this implementation of](https://reader034.fdocuments.in/reader034/viewer/2022042413/5f2dc9ca7c274957f42561b9/html5/thumbnails/8.jpg)
Faculties & Institutes
• Faculties: 18 • Institutes: 2 (Sugar Industry, Oncology
institute) • International Students: Yemen, Malaysia,
Kuwait, Iraq http://www.aun.edu.eg/
![Page 9: Big Data Analytics for - uni-jena.de · project and its popularity as a big data processing engine has taken off. • It is a much simpler to install and use this implementation of](https://reader034.fdocuments.in/reader034/viewer/2022042413/5f2dc9ca7c274957f42561b9/html5/thumbnails/9.jpg)
Faculty of Computers & Information Assiut University
9
Lab Building Administrative Building
Established in 2001
![Page 10: Big Data Analytics for - uni-jena.de · project and its popularity as a big data processing engine has taken off. • It is a much simpler to install and use this implementation of](https://reader034.fdocuments.in/reader034/viewer/2022042413/5f2dc9ca7c274957f42561b9/html5/thumbnails/10.jpg)
Faculty of Computers & Information Assiut University (Staff)
• Information Systems (1 professor), 1 assistant professor, 4 teaching assistants, 7 demonstrators)
• Information Technology (1 professor & 3 assistant professors), 2 TA, 6 D)
• Computer Science (2 professors, 3 associate professors, 3 associate lecturers) 6 TA, 10 D)
• Multimedia Systems (1 associate professor)
10
![Page 11: Big Data Analytics for - uni-jena.de · project and its popularity as a big data processing engine has taken off. • It is a much simpler to install and use this implementation of](https://reader034.fdocuments.in/reader034/viewer/2022042413/5f2dc9ca7c274957f42561b9/html5/thumbnails/11.jpg)
Faculty of Computers & Information Assiut University (Facilities)
• Undergraduate labs: 9 • Lecture Halls: 9 • Specialized labs 5: (GIS, Multimedia, HP, Big
Data, and Bioinformatics) • Research labs: 5
11
![Page 12: Big Data Analytics for - uni-jena.de · project and its popularity as a big data processing engine has taken off. • It is a much simpler to install and use this implementation of](https://reader034.fdocuments.in/reader034/viewer/2022042413/5f2dc9ca7c274957f42561b9/html5/thumbnails/12.jpg)
Geographic Information System Labs GIS Lab consists of three modules : GIS Undergraduate Lab GIS Research Unit GIS Servers Unit
![Page 13: Big Data Analytics for - uni-jena.de · project and its popularity as a big data processing engine has taken off. • It is a much simpler to install and use this implementation of](https://reader034.fdocuments.in/reader034/viewer/2022042413/5f2dc9ca7c274957f42561b9/html5/thumbnails/13.jpg)
Geographic Information System Labs Contents :
Number of (20) computer device from module (Dell OptiPlex 380) which specifications (intel Core2Duo ,2GB of Ram)
Number of (1) Plotter device from module (HP Designjet T1200) to print a geographical maps . Number of (1) Data Show device in addition to show board for it .
![Page 14: Big Data Analytics for - uni-jena.de · project and its popularity as a big data processing engine has taken off. • It is a much simpler to install and use this implementation of](https://reader034.fdocuments.in/reader034/viewer/2022042413/5f2dc9ca7c274957f42561b9/html5/thumbnails/14.jpg)
Asyut Medical & Public Services Application
Clinics Medical Centers pharmacies
Medical Labs
Ambulance
Public Services
![Page 15: Big Data Analytics for - uni-jena.de · project and its popularity as a big data processing engine has taken off. • It is a much simpler to install and use this implementation of](https://reader034.fdocuments.in/reader034/viewer/2022042413/5f2dc9ca7c274957f42561b9/html5/thumbnails/15.jpg)
Multimedia Lab
![Page 16: Big Data Analytics for - uni-jena.de · project and its popularity as a big data processing engine has taken off. • It is a much simpler to install and use this implementation of](https://reader034.fdocuments.in/reader034/viewer/2022042413/5f2dc9ca7c274957f42561b9/html5/thumbnails/16.jpg)
Multimedia Production Unit
Multimedia Production Unit
![Page 17: Big Data Analytics for - uni-jena.de · project and its popularity as a big data processing engine has taken off. • It is a much simpler to install and use this implementation of](https://reader034.fdocuments.in/reader034/viewer/2022042413/5f2dc9ca7c274957f42561b9/html5/thumbnails/17.jpg)
)Voice Recording Unit(
Multimedia Research Unit
![Page 18: Big Data Analytics for - uni-jena.de · project and its popularity as a big data processing engine has taken off. • It is a much simpler to install and use this implementation of](https://reader034.fdocuments.in/reader034/viewer/2022042413/5f2dc9ca7c274957f42561b9/html5/thumbnails/18.jpg)
Bioinformatics Research Lab & Big Data Labs
![Page 19: Big Data Analytics for - uni-jena.de · project and its popularity as a big data processing engine has taken off. • It is a much simpler to install and use this implementation of](https://reader034.fdocuments.in/reader034/viewer/2022042413/5f2dc9ca7c274957f42561b9/html5/thumbnails/19.jpg)
Information Systems Dept. Research Directions
Big Data Analytics
BioDiversity Informatics
Database Management
Data Mining
Semantic Data
Integration
Recommender Systems
Bioinformatics GIS Health Informatics
![Page 20: Big Data Analytics for - uni-jena.de · project and its popularity as a big data processing engine has taken off. • It is a much simpler to install and use this implementation of](https://reader034.fdocuments.in/reader034/viewer/2022042413/5f2dc9ca7c274957f42561b9/html5/thumbnails/20.jpg)
Computer Science Dept. Research Directions
Software Engineering
Distributed Computing
Computer Vision
Image Processing
High Performance Computing
Cloud Computing
Artificial Intelligence
![Page 21: Big Data Analytics for - uni-jena.de · project and its popularity as a big data processing engine has taken off. • It is a much simpler to install and use this implementation of](https://reader034.fdocuments.in/reader034/viewer/2022042413/5f2dc9ca7c274957f42561b9/html5/thumbnails/21.jpg)
Information Technology Dept. Research Directions
Ad Hoc Networks
Internet of Things
Mobile Computing
Vision and Robotics
Network Security
Cloud Computing
Broadcasting and media
technologies
![Page 22: Big Data Analytics for - uni-jena.de · project and its popularity as a big data processing engine has taken off. • It is a much simpler to install and use this implementation of](https://reader034.fdocuments.in/reader034/viewer/2022042413/5f2dc9ca7c274957f42561b9/html5/thumbnails/22.jpg)
Biodiversity Informatics Previous Activities at Assiut University
![Page 23: Big Data Analytics for - uni-jena.de · project and its popularity as a big data processing engine has taken off. • It is a much simpler to install and use this implementation of](https://reader034.fdocuments.in/reader034/viewer/2022042413/5f2dc9ca7c274957f42561b9/html5/thumbnails/23.jpg)
![Page 24: Big Data Analytics for - uni-jena.de · project and its popularity as a big data processing engine has taken off. • It is a much simpler to install and use this implementation of](https://reader034.fdocuments.in/reader034/viewer/2022042413/5f2dc9ca7c274957f42561b9/html5/thumbnails/24.jpg)
![Page 25: Big Data Analytics for - uni-jena.de · project and its popularity as a big data processing engine has taken off. • It is a much simpler to install and use this implementation of](https://reader034.fdocuments.in/reader034/viewer/2022042413/5f2dc9ca7c274957f42561b9/html5/thumbnails/25.jpg)
BioDiversity Informatics Workshop at Faculty of Computers and
Information, Assiut University • Number of scientists: 34 (Faculty of Science,
Computers and Information, Agriculture, EELU) and 17 (teaching assistants) Number of undergraduate students: 156
• Number of employees: 9 • A total of 216 attendees
![Page 26: Big Data Analytics for - uni-jena.de · project and its popularity as a big data processing engine has taken off. • It is a much simpler to install and use this implementation of](https://reader034.fdocuments.in/reader034/viewer/2022042413/5f2dc9ca7c274957f42561b9/html5/thumbnails/26.jpg)
BioDiversity Informatics Research Group
Prof. Dr.Taysir Hassan Vice Dean for Faculty of Computers & Information for Graduate Studies & Research, Assiut University PI
Prof. Dr. Medhat Moreed Vice Dean for Societal Services and Environmental Development Faculty of Science, Assiut University
Prof. Dr. Adel AbuElmagd Dean of Faculty of Faculty of Computers & Information, Assiut University
Prof. Dr. Ahmed Moharam Vice President of Fungi Research Institute Assiut University
![Page 27: Big Data Analytics for - uni-jena.de · project and its popularity as a big data processing engine has taken off. • It is a much simpler to install and use this implementation of](https://reader034.fdocuments.in/reader034/viewer/2022042413/5f2dc9ca7c274957f42561b9/html5/thumbnails/27.jpg)
Marwa Hussein Assistant Lecturer Information Systems Department Faculty of Computers and Information Assiut University
Majid Askar Assistant Lecturer Computer Science Department Faculty of Computers and Information Assiut University
Dr. Ahmed Taloba Assistant Professor, IS Department, FCI, Assiut University
Dr. Ahmed Albanhawy Assistant Professor, Botany Department Faculty of Science, Suez Canal
![Page 28: Big Data Analytics for - uni-jena.de · project and its popularity as a big data processing engine has taken off. • It is a much simpler to install and use this implementation of](https://reader034.fdocuments.in/reader034/viewer/2022042413/5f2dc9ca7c274957f42561b9/html5/thumbnails/28.jpg)
From AinShams Workshop Sept. 2016
Wady El-Hetan
![Page 29: Big Data Analytics for - uni-jena.de · project and its popularity as a big data processing engine has taken off. • It is a much simpler to install and use this implementation of](https://reader034.fdocuments.in/reader034/viewer/2022042413/5f2dc9ca7c274957f42561b9/html5/thumbnails/29.jpg)
![Page 30: Big Data Analytics for - uni-jena.de · project and its popularity as a big data processing engine has taken off. • It is a much simpler to install and use this implementation of](https://reader034.fdocuments.in/reader034/viewer/2022042413/5f2dc9ca7c274957f42561b9/html5/thumbnails/30.jpg)
Why Big Data ?
• We need big data to the distribution of biodiversity
• Once scientific data becomes an essential transparency will be a must (publications and accessibility) … Ecological data access
• Science-driven data . • In global ecology, we go with problems that
![Page 31: Big Data Analytics for - uni-jena.de · project and its popularity as a big data processing engine has taken off. • It is a much simpler to install and use this implementation of](https://reader034.fdocuments.in/reader034/viewer/2022042413/5f2dc9ca7c274957f42561b9/html5/thumbnails/31.jpg)
Global Environmental Changes
• Habitat loss and species extinction, • Where willanimals move to survive? • Will human development prevent them from
getting there? Solution: conservation strategies are a crucial step toward minimizing biodiversity loss. • • Oceans acidification and land use
![Page 32: Big Data Analytics for - uni-jena.de · project and its popularity as a big data processing engine has taken off. • It is a much simpler to install and use this implementation of](https://reader034.fdocuments.in/reader034/viewer/2022042413/5f2dc9ca7c274957f42561b9/html5/thumbnails/32.jpg)
Global BioDiversity and Human Health
Fresh Water
Infectious Diseases
Air Quality
Agriculture
Role of Plants Pharmaceuticals
WHO Report
![Page 33: Big Data Analytics for - uni-jena.de · project and its popularity as a big data processing engine has taken off. • It is a much simpler to install and use this implementation of](https://reader034.fdocuments.in/reader034/viewer/2022042413/5f2dc9ca7c274957f42561b9/html5/thumbnails/33.jpg)
• Measuring traits of individual organisms (nitrogen concentrations)
• Species distribution dataset (Flora, phona, geographic associations with museum data)
![Page 34: Big Data Analytics for - uni-jena.de · project and its popularity as a big data processing engine has taken off. • It is a much simpler to install and use this implementation of](https://reader034.fdocuments.in/reader034/viewer/2022042413/5f2dc9ca7c274957f42561b9/html5/thumbnails/34.jpg)
Questions ???
• Is it a “Data-driven” or a “knowledge-driven” science ?
• Examples of research questions we can solve through relating big data to biodiversity informatics?
• In which part of big data life cycle phases we can extract research questions for biodiversity informatics?
![Page 35: Big Data Analytics for - uni-jena.de · project and its popularity as a big data processing engine has taken off. • It is a much simpler to install and use this implementation of](https://reader034.fdocuments.in/reader034/viewer/2022042413/5f2dc9ca7c274957f42561b9/html5/thumbnails/35.jpg)
Example 1: Identify Biodiversity Hostpots
• It is widely acknowledged that biodiversity is much more than just the number of species in a region and a conservation strategy cannot be based merely on the number of taxa presenting an ecosystem.
• Therefore ,the idea that strongly emerges is the need to reconsider conservation priorities and to go to ward an interdisciplinary approach through the creation of science-policy partnerships.
![Page 36: Big Data Analytics for - uni-jena.de · project and its popularity as a big data processing engine has taken off. • It is a much simpler to install and use this implementation of](https://reader034.fdocuments.in/reader034/viewer/2022042413/5f2dc9ca7c274957f42561b9/html5/thumbnails/36.jpg)
Is it just point distributions ?????? Have a HYPOTHESIS
![Page 37: Big Data Analytics for - uni-jena.de · project and its popularity as a big data processing engine has taken off. • It is a much simpler to install and use this implementation of](https://reader034.fdocuments.in/reader034/viewer/2022042413/5f2dc9ca7c274957f42561b9/html5/thumbnails/37.jpg)
Other Examples
ICUN Redlists?
![Page 38: Big Data Analytics for - uni-jena.de · project and its popularity as a big data processing engine has taken off. • It is a much simpler to install and use this implementation of](https://reader034.fdocuments.in/reader034/viewer/2022042413/5f2dc9ca7c274957f42561b9/html5/thumbnails/38.jpg)
Other Examples
![Page 39: Big Data Analytics for - uni-jena.de · project and its popularity as a big data processing engine has taken off. • It is a much simpler to install and use this implementation of](https://reader034.fdocuments.in/reader034/viewer/2022042413/5f2dc9ca7c274957f42561b9/html5/thumbnails/39.jpg)
![Page 40: Big Data Analytics for - uni-jena.de · project and its popularity as a big data processing engine has taken off. • It is a much simpler to install and use this implementation of](https://reader034.fdocuments.in/reader034/viewer/2022042413/5f2dc9ca7c274957f42561b9/html5/thumbnails/40.jpg)
Biodiversity Data Characteristics • Voluminous • Incremental • Complex • Scalability • Heterogeneity • Has a taxonomy type • Distribution --- Global Biodiversity Information Facility
(GBIF) currently holds over 577 million occurrence records in the areas of climate change, human health, food and security, biofuels, ecosystem services.
• Genetic/ Genomic Information – environmental genomics, including metagenomics and metabarcoding
![Page 41: Big Data Analytics for - uni-jena.de · project and its popularity as a big data processing engine has taken off. • It is a much simpler to install and use this implementation of](https://reader034.fdocuments.in/reader034/viewer/2022042413/5f2dc9ca7c274957f42561b9/html5/thumbnails/41.jpg)
Heterogeneous Data Types
![Page 42: Big Data Analytics for - uni-jena.de · project and its popularity as a big data processing engine has taken off. • It is a much simpler to install and use this implementation of](https://reader034.fdocuments.in/reader034/viewer/2022042413/5f2dc9ca7c274957f42561b9/html5/thumbnails/42.jpg)
Technical & Non-technical Priority Areas for Biodiversity Informatics Research
Technical Priority Areas: • Deep analysis … > to improve data understanding; • Optimized architectures for analytics of data-at-rest and data-in-
motion; • Mechanisms for managing privacy … to enable the vast amounts of
data which are not open data (and never can be open data) to be part of the Data Value Chain;
• Advanced visualization and user experience • Data management engineering. Non-technical Priority Areas: • Skills development, • Business models and ecosystems; • Policy, regulation and standardization; • Social perceptions.
![Page 43: Big Data Analytics for - uni-jena.de · project and its popularity as a big data processing engine has taken off. • It is a much simpler to install and use this implementation of](https://reader034.fdocuments.in/reader034/viewer/2022042413/5f2dc9ca7c274957f42561b9/html5/thumbnails/43.jpg)
Big Data Analytics Life Cycle
![Page 44: Big Data Analytics for - uni-jena.de · project and its popularity as a big data processing engine has taken off. • It is a much simpler to install and use this implementation of](https://reader034.fdocuments.in/reader034/viewer/2022042413/5f2dc9ca7c274957f42561b9/html5/thumbnails/44.jpg)
Describe Preserve
Discover
Integrate
Analyze
Assure
Collect
Plan
<metadata/>
Publish
Scientific Data management
![Page 45: Big Data Analytics for - uni-jena.de · project and its popularity as a big data processing engine has taken off. • It is a much simpler to install and use this implementation of](https://reader034.fdocuments.in/reader034/viewer/2022042413/5f2dc9ca7c274957f42561b9/html5/thumbnails/45.jpg)
Scientist
Visualization
Visualization
E-Bird
![Page 46: Big Data Analytics for - uni-jena.de · project and its popularity as a big data processing engine has taken off. • It is a much simpler to install and use this implementation of](https://reader034.fdocuments.in/reader034/viewer/2022042413/5f2dc9ca7c274957f42561b9/html5/thumbnails/46.jpg)
Big Data Analytics Life Cycle
How do I assure my data for quality?
How do I choose my algorithm ?
Which type of Architecture do I use?
![Page 47: Big Data Analytics for - uni-jena.de · project and its popularity as a big data processing engine has taken off. • It is a much simpler to install and use this implementation of](https://reader034.fdocuments.in/reader034/viewer/2022042413/5f2dc9ca7c274957f42561b9/html5/thumbnails/47.jpg)
![Page 48: Big Data Analytics for - uni-jena.de · project and its popularity as a big data processing engine has taken off. • It is a much simpler to install and use this implementation of](https://reader034.fdocuments.in/reader034/viewer/2022042413/5f2dc9ca7c274957f42561b9/html5/thumbnails/48.jpg)
IDigBio
![Page 49: Big Data Analytics for - uni-jena.de · project and its popularity as a big data processing engine has taken off. • It is a much simpler to install and use this implementation of](https://reader034.fdocuments.in/reader034/viewer/2022042413/5f2dc9ca7c274957f42561b9/html5/thumbnails/49.jpg)
IdigBio
![Page 50: Big Data Analytics for - uni-jena.de · project and its popularity as a big data processing engine has taken off. • It is a much simpler to install and use this implementation of](https://reader034.fdocuments.in/reader034/viewer/2022042413/5f2dc9ca7c274957f42561b9/html5/thumbnails/50.jpg)
Big Data Challenges for ML and EDA
• Format variation of the raw data • Noisy and poor quality data • Fast moving streaming data • Trustworthiness of the data analysis • Highly distributed input sources • High dimensionality • Scalability of algorithms
![Page 51: Big Data Analytics for - uni-jena.de · project and its popularity as a big data processing engine has taken off. • It is a much simpler to install and use this implementation of](https://reader034.fdocuments.in/reader034/viewer/2022042413/5f2dc9ca7c274957f42561b9/html5/thumbnails/51.jpg)
Part I: Machine Learning Approaches
• One example is the usage of Deep Learning • Deep learning algorithms lead to abstract
representations because more abstract representations are often constructed based on less abstract ones.
• An important advantage of more abstract representations is that they can be invariant to the local changes in the input data.
• Learning such invariant features is an ongoing major goal in pattern recognition
![Page 52: Big Data Analytics for - uni-jena.de · project and its popularity as a big data processing engine has taken off. • It is a much simpler to install and use this implementation of](https://reader034.fdocuments.in/reader034/viewer/2022042413/5f2dc9ca7c274957f42561b9/html5/thumbnails/52.jpg)
Example
An image is composed of different sources of variations such a light, object shapes, and object materials. The abstract representations provided by deep learning algorithms can separate the different sources of variations in data.
![Page 53: Big Data Analytics for - uni-jena.de · project and its popularity as a big data processing engine has taken off. • It is a much simpler to install and use this implementation of](https://reader034.fdocuments.in/reader034/viewer/2022042413/5f2dc9ca7c274957f42561b9/html5/thumbnails/53.jpg)
Example of A DNN
Learning the parameters in a deep architecture is a difficult optimization task, such as learning the parameters in neural networks with many hidden layers.
![Page 54: Big Data Analytics for - uni-jena.de · project and its popularity as a big data processing engine has taken off. • It is a much simpler to install and use this implementation of](https://reader034.fdocuments.in/reader034/viewer/2022042413/5f2dc9ca7c274957f42561b9/html5/thumbnails/54.jpg)
• Google’s “word2vec” tool is a technique for automated extraction of semantic representations from Big Data.
• This tool takes a large-scale text corpus as input and produces the word vectors as output.
![Page 55: Big Data Analytics for - uni-jena.de · project and its popularity as a big data processing engine has taken off. • It is a much simpler to install and use this implementation of](https://reader034.fdocuments.in/reader034/viewer/2022042413/5f2dc9ca7c274957f42561b9/html5/thumbnails/55.jpg)
Deep Learning
• Extracting complex patterns from massive volumes of data,
• Semantic indexing, • Data tagging, • Fast information retrieval
![Page 56: Big Data Analytics for - uni-jena.de · project and its popularity as a big data processing engine has taken off. • It is a much simpler to install and use this implementation of](https://reader034.fdocuments.in/reader034/viewer/2022042413/5f2dc9ca7c274957f42561b9/html5/thumbnails/56.jpg)
Deep Learning in Biodiversity Distribution (WildeLife Monitoring)
• Affordable and effective measures of conservation outcomes.
• Improve the quality of conservation monitoring and to scale monitoring programs to meet the global need.
• Extract meaningful information from the torrent of new sensor data, and improve the adaptive management of natural systems.
![Page 57: Big Data Analytics for - uni-jena.de · project and its popularity as a big data processing engine has taken off. • It is a much simpler to install and use this implementation of](https://reader034.fdocuments.in/reader034/viewer/2022042413/5f2dc9ca7c274957f42561b9/html5/thumbnails/57.jpg)
Case Studies Monitoring
Invasing species
Detecting Rare
Species Monitoring Population
through time
Empower biologists to analyze petabytes of sensor data from a network of remote microphones and cameras.
This system, which is being used to monitor endangered species and ecosystems around the globe, has enabled an order of magnitude improvement in the cost effectiveness of such projects.
This approach can be expanded to encompass a greater variety of sensor sources, such as drones, to monitor animal populations, habitat quality, and to actively deter wildlife from hazardous structures.
Detecting Bird
Vocalization
Detecting Fish in
underwater
![Page 58: Big Data Analytics for - uni-jena.de · project and its popularity as a big data processing engine has taken off. • It is a much simpler to install and use this implementation of](https://reader034.fdocuments.in/reader034/viewer/2022042413/5f2dc9ca7c274957f42561b9/html5/thumbnails/58.jpg)
Part II: The HOW-TO … Practice
![Page 59: Big Data Analytics for - uni-jena.de · project and its popularity as a big data processing engine has taken off. • It is a much simpler to install and use this implementation of](https://reader034.fdocuments.in/reader034/viewer/2022042413/5f2dc9ca7c274957f42561b9/html5/thumbnails/59.jpg)
![Page 60: Big Data Analytics for - uni-jena.de · project and its popularity as a big data processing engine has taken off. • It is a much simpler to install and use this implementation of](https://reader034.fdocuments.in/reader034/viewer/2022042413/5f2dc9ca7c274957f42561b9/html5/thumbnails/60.jpg)
![Page 61: Big Data Analytics for - uni-jena.de · project and its popularity as a big data processing engine has taken off. • It is a much simpler to install and use this implementation of](https://reader034.fdocuments.in/reader034/viewer/2022042413/5f2dc9ca7c274957f42561b9/html5/thumbnails/61.jpg)
![Page 62: Big Data Analytics for - uni-jena.de · project and its popularity as a big data processing engine has taken off. • It is a much simpler to install and use this implementation of](https://reader034.fdocuments.in/reader034/viewer/2022042413/5f2dc9ca7c274957f42561b9/html5/thumbnails/62.jpg)
Using Spark for BioDiversity Data
• Processing snapshots of biodiversity data providers’ entire datasets locally is an important capability.
• It allows broad questions to be asked across multiple data providers without needing to wait for providers to develop integrations or interfaces with each other;
• the providers’ web interfaces and application programming interfaces (APIs) no longer limit the way data is presented
• data can be processed at a much higher rate locally instead of through APIs.
![Page 63: Big Data Analytics for - uni-jena.de · project and its popularity as a big data processing engine has taken off. • It is a much simpler to install and use this implementation of](https://reader034.fdocuments.in/reader034/viewer/2022042413/5f2dc9ca7c274957f42561b9/html5/thumbnails/63.jpg)
Spark • In 2014, Spark became an Apache Foundation top-level
project and its popularity as a big data processing engine has taken off.
• It is a much simpler to install and use this implementation of the map-reduce pattern of data processing than its industry-favorite predecessor, Hadoop.
• With Spark, arbitrary querying, joining, and reducing operations on and between entire biodiversity datasets can be done with very little code on a desktop computer or commonly available cloud computing resources.
• Machine Learning Library (Mllib)
![Page 64: Big Data Analytics for - uni-jena.de · project and its popularity as a big data processing engine has taken off. • It is a much simpler to install and use this implementation of](https://reader034.fdocuments.in/reader034/viewer/2022042413/5f2dc9ca7c274957f42561b9/html5/thumbnails/64.jpg)
iDigBio
• iDigBio – 44 million record datasets. • Sparkonomy, an iDigBio tool, was developed
to join tokenized taxon names from iDigBio to GBIF’s backbone taxonomy in a few minutes on a desktop computer.
• Effechecka from EOL is an early-phase web application that uses Spark jobs to construct checklists for taxon and spatial queries from iDigBio occurrence information.
![Page 65: Big Data Analytics for - uni-jena.de · project and its popularity as a big data processing engine has taken off. • It is a much simpler to install and use this implementation of](https://reader034.fdocuments.in/reader034/viewer/2022042413/5f2dc9ca7c274957f42561b9/html5/thumbnails/65.jpg)
![Page 66: Big Data Analytics for - uni-jena.de · project and its popularity as a big data processing engine has taken off. • It is a much simpler to install and use this implementation of](https://reader034.fdocuments.in/reader034/viewer/2022042413/5f2dc9ca7c274957f42561b9/html5/thumbnails/66.jpg)
Perform interactive analytics on observational scientific data
Grid or Many Task Software, Hadoop, Spark
Data Storage: HDFS, Hbase, File Collection
Streaming data for weather
Science Analysis Code, Mahout, R
Transport batch of data to primary analysis data system
Record Scientific Data in “field”
Local Accumulate and initial computing
Direct Transfer
Examples include Remote Sensing, Astronomy and Bioinformatics
![Page 67: Big Data Analytics for - uni-jena.de · project and its popularity as a big data processing engine has taken off. • It is a much simpler to install and use this implementation of](https://reader034.fdocuments.in/reader034/viewer/2022042413/5f2dc9ca7c274957f42561b9/html5/thumbnails/67.jpg)
References (1) [1] J. Salle, K. J. Williams, and C. Moritz, “BioDiversity Analysis in the Digital Era,” Phil. Trans. R. Soc. B371:20150337. [2] M. Collins, J. Poelen, A. Thompson, “Whole-Dataset Analysis using Apache Spark,” Missouri Botanical Garden Open Conference Systems, TDWG 2015 ANNUAL CONFERENCE. [3] C. Marchese, “Biodiversity Hotspots: A Shortcut for A More Complicated Concept,” Global Ecology and Conservation, Vol. 3, pp.297-309, 2015. [4] D. Klein, M. McKown, and B. Tershy, “Deep Learning for Large Scale BioDiversity Monitoring,” Bloomberg Data for Good Exchange Conference. 28-Sep-2015, New York City, NY, USA. [5] M. Najafabadi, F. Villanustre, T. Khoshgoftaar, N. Seliya, R. Wald, and E. Muharemagic, “Deep learning applications and challenges in big data analytics,” Journal of Big Data.
![Page 68: Big Data Analytics for - uni-jena.de · project and its popularity as a big data processing engine has taken off. • It is a much simpler to install and use this implementation of](https://reader034.fdocuments.in/reader034/viewer/2022042413/5f2dc9ca7c274957f42561b9/html5/thumbnails/68.jpg)
References (2) • https://bigdatacoursespring2015.appspot.com/preview • http://bigdataopensourceprojects.soic.indiana.edu/ • http://dx.doi.org/10.1098/rstb.2015.0337 • http://www.gbif.org
1/26/2015 68
![Page 69: Big Data Analytics for - uni-jena.de · project and its popularity as a big data processing engine has taken off. • It is a much simpler to install and use this implementation of](https://reader034.fdocuments.in/reader034/viewer/2022042413/5f2dc9ca7c274957f42561b9/html5/thumbnails/69.jpg)
![Page 70: Big Data Analytics for - uni-jena.de · project and its popularity as a big data processing engine has taken off. • It is a much simpler to install and use this implementation of](https://reader034.fdocuments.in/reader034/viewer/2022042413/5f2dc9ca7c274957f42561b9/html5/thumbnails/70.jpg)