Big Data Challenges, Presented by Wes Caldwell at SolrExchage DC
-
Upload
lucidworks-archived -
Category
Technology
-
view
236 -
download
2
Transcript of Big Data Challenges, Presented by Wes Caldwell at SolrExchage DC
![Page 1: Big Data Challenges, Presented by Wes Caldwell at SolrExchage DC](https://reader036.fdocuments.in/reader036/viewer/2022062703/555097e6b4c90595208b46ea/html5/thumbnails/1.jpg)
![Page 2: Big Data Challenges, Presented by Wes Caldwell at SolrExchage DC](https://reader036.fdocuments.in/reader036/viewer/2022062703/555097e6b4c90595208b46ea/html5/thumbnails/2.jpg)
Big Data Challenges
in the DoD and IC
Wes Caldwell Chief Architect
Intelligent Software Solutions
![Page 3: Big Data Challenges, Presented by Wes Caldwell at SolrExchage DC](https://reader036.fdocuments.in/reader036/viewer/2022062703/555097e6b4c90595208b46ea/html5/thumbnails/3.jpg)
Topics
• Introduction to ISS
• The growth of data
• Our customer’s data environment
• The need for effective big-data management
• Search as the cornerstone of a big-data strategy
![Page 4: Big Data Challenges, Presented by Wes Caldwell at SolrExchage DC](https://reader036.fdocuments.in/reader036/viewer/2022062703/555097e6b4c90595208b46ea/html5/thumbnails/4.jpg)
About ISS
• Headquartered in Colorado Springs • Other offices located in Washington DC, Hampton VA,
Tampa FL, and Rome NY
• Innovative Solutions from “Space to Mud and Everything Between” • Sole prime on multiple Air Force Research Labs
programs IDIQ • Currently Executing More Than 100 Software
Development Projects • Over 800 employees • Strength in Solutions Development and
Deployment
• Consistently Recognized as a Leader • Recognized as a Deloitte Fast 50 Colorado
company and a Deloitte Fast 500 company over eight consecutive years
• Three-time Inc. Magazine 500 winner • 2009 Defense Company of the Year
![Page 5: Big Data Challenges, Presented by Wes Caldwell at SolrExchage DC](https://reader036.fdocuments.in/reader036/viewer/2022062703/555097e6b4c90595208b46ea/html5/thumbnails/5.jpg)
ISS Solution Space/Value Proposition
• Reusable and license-free to US Federal Government (GOTS)
• Committed to providing best ROI to our customers by integrating leading open-source solutions into our products and services
• Scalable from a single desktop solution to large distributed networks with thousands of users
• Customizable to each organization’s unique analytical and information technology infrastructure
• Operationally proven, secure and accredited for all major classified networks
![Page 6: Big Data Challenges, Presented by Wes Caldwell at SolrExchage DC](https://reader036.fdocuments.in/reader036/viewer/2022062703/555097e6b4c90595208b46ea/html5/thumbnails/6.jpg)
ISS Business Strategy
Government
Off The Shelf
(GOTS)
Commercial
Off The Shelf
(COTS)
Subject
Matter Experts
(SMEs)
• Low Barrier to Entry: No license fees to US Government Agencies
• Fast: Proven baseline provides immediate capability
• Turnkey: Highly customizable solutions can be implemented quickly with no development
• Solutions Oriented: Subject Matter Experts support implementation in each domain
• Low Cost: Cost of Adding Features is shared across large customer base; all customers benefit
Blending the best elements of each industry model to provide low risk, nonproprietary, high payoff solutions—fast! 6
![Page 7: Big Data Challenges, Presented by Wes Caldwell at SolrExchage DC](https://reader036.fdocuments.in/reader036/viewer/2022062703/555097e6b4c90595208b46ea/html5/thumbnails/7.jpg)
The growth of data
• Most electronic information is not relational,
but unstructured (textual, binary) or semi-
structured (spreadsheet, RSS feed, etc.)
– In 2007, the estimated information content of all
human knowledge was 295 exabytes(295 million
terabytes)
– Data production will be 44 times greater in 2020
than in 2009
• Approx 35 zetabytes total (35 billion terabytes)
• A majority of the data produced in the future will
be unstructured
– A tremendous amount of information and
knowledge is dormant within unstructured data
![Page 8: Big Data Challenges, Presented by Wes Caldwell at SolrExchage DC](https://reader036.fdocuments.in/reader036/viewer/2022062703/555097e6b4c90595208b46ea/html5/thumbnails/8.jpg)
Our customer’s data environment
• Literally thousands of data sources/feeds
from a variety of strategic, national, and
tactical sources
– Media (documents, images, etc.)
– Human interactions
– Geospatial
– Open Source (News feeds, RSS)
– Imagery/Video
– Many more…
![Page 9: Big Data Challenges, Presented by Wes Caldwell at SolrExchage DC](https://reader036.fdocuments.in/reader036/viewer/2022062703/555097e6b4c90595208b46ea/html5/thumbnails/9.jpg)
How our analysts feel
![Page 10: Big Data Challenges, Presented by Wes Caldwell at SolrExchage DC](https://reader036.fdocuments.in/reader036/viewer/2022062703/555097e6b4c90595208b46ea/html5/thumbnails/10.jpg)
The need for effective “big-data” management
• Analysts are looking to extract knowledge from the massive heterogeneous
data sets, providing “actionable intelligence”
• Tactical environments absolutely demand effective management of data
– Time to live on the relevance of data collected can be very short
– Communications pipes aren’t as optimal as large CONUS-based data
centers, so reduction of data based on tactical conditions (i.e. AOR,
Problem Domain, etc.) is critical
• Search and Analytics are key enablers to allow an analyst to reliably search
through large amounts of information, and to focus their efforts around a
subset of that information to perform deeper analysis
![Page 11: Big Data Challenges, Presented by Wes Caldwell at SolrExchage DC](https://reader036.fdocuments.in/reader036/viewer/2022062703/555097e6b4c90595208b46ea/html5/thumbnails/11.jpg)
Search IS the cornerstone of an effective big-data strategy
Structured Content
Semi-Structured Content
Un-Structured Content
Content Cache (Haystacks)
Content Acquisition
Tenets • Connector architecture • Data normalization • Data staging • Data Compartmenting
(Multiple Haystacks)
Tenets • Optimized Index of Content
for Search and Discovery of Big Data
• Analyst Topics that “Shrink the Haystack” Search Features (Facets, Auto-Complete, Tagging, Comments, etc.)
• Semantic (Synonym) Search based on pluggable taxonomies
Search/Discovery
Content Index
NLP Pipeline
Semantic Enrichment
Categorization
Named Entity
Recognition
Clustering
Gazetteers
Tenets • “Domain Spaces” that
support pluggable entity recognition and categorization
• Continuous feedback loop that improves the system over time with analyst input
• Lexicon-based analytics that allows for targeted categorization across corpus of data
Tenets • Data Reduction into
focused “Data Perspectives”
• Data perspectives stored in optimized formats (e.g. Graph, Time Series, Geo, etc.) for the questions being asked
• Leveraging industry-standard parallel processing frameworks for scalable analytics
Data Perspectives
Data
![Page 12: Big Data Challenges, Presented by Wes Caldwell at SolrExchage DC](https://reader036.fdocuments.in/reader036/viewer/2022062703/555097e6b4c90595208b46ea/html5/thumbnails/12.jpg)
How can Search help you?
Have a great conference!!!