Intelligent Classifier STI Innsbruck & Excogito User-friendly Semi-Automatic Product Classification...

Post on 14-Dec-2015

217 views 1 download

Tags:

Transcript of Intelligent Classifier STI Innsbruck & Excogito User-friendly Semi-Automatic Product Classification...

Intelligent Classifier

STI Innsbruck  &  Excogito

User-friendly Semi-Automatic Product Classification System

People

1. Supervision: Marcus Spies

2. People: Sigurd Harand, Christian Leibold

3. Contact person: Christian Leibold, christian.leibold@deri.at

4. Industrial cooperation with Excogito, Maksym Korotkiy

© 2002 - 2007 STI Innsbruck  &  Excogito. All Rights Reserved.

© 2002 - 2007 STI Innsbruck  &  Excogito. All Rights Reserved.

Outline

1. Context: Product Classification Problem

2. Project intro, positioning and objectives

3. Workflow driven approach

4. GoldenBullet shooting market

a) Improved Software architecture

• Java XML Registries

• User taxonomies

b) Improved (re-)usability and quality

5. Conclusions and Future

6. Online Demo

© 2002 - 2007 STI Innsbruck  &  Excogito. All Rights Reserved.

Product Classification Problem

1. E-Catalogs contain thousands of cryptic product descriptions

1. CAREPAQ BUREAU PROSIGNIA3YRS/SITE/J+1/TEL

2. TRAINING ACT/ASEEXCEPT TRU64UNIX and OPENVMS

3. ….

2. Businesses have to deal with thousands of e-catalogs

3. Classification standards have tens of thousands of product categories (21192 in UNSPSC 8.04)

4. The result: high manual classification effort is required

© 2002 - 2007 STI Innsbruck  &  Excogito. All Rights Reserved.

© 2002 - 2007 STI Innsbruck  &  Excogito. All Rights Reserved.

• many standards (e.g. UNSPSC, eCl@ss, ebXML, GPC, …),

– ~20.000 classes,

– millions of products

• Current SOA: Outsourcing to low-salary countries or use of (counterproductive) low level quality software tools with 25% failure rates

• GoldenBullet 2 research prototype offered an exclusive "semi-automatic" functionality to support the classification by manual intervention and to achieve by "learning" a classification level of 95% and speed up the process up to 60 times

• The development of the GB IC product into a marketable product will be an innovative creation of added value and help to reduce outsourcing of labor.

GB IC Positioning and Objectives

Project intro

1. Project won ProIT funding (cooperation between transIT and CAST)

2. Duration: 1st September 2007 - 31st August 2008

3. Objectives:

• Submission of a debugged, robust and marketable GB IC Prototype

• Extended Usability and Robustness

• Extended Reusability

4. Completed tasks & Status:

• Worked out contract for handling IPR between stakeholders (UIBK, Excogito NL, BvW Global Pty)

• Including foundation regulations for marketing and selling

• 1st report with deliverable of the technical specification accepted by CAST and transIT

• Cooperation with industrial partner Excogito© 2002 - 2007 STI Innsbruck  &  Excogito. All Rights Reserved.

Workflow Driven Approach

1. GoldenBullet semi-automatically classifies product descriptions into a standard (e.g. UNSPSC) by employing

1. NLP techniques to preprocess descriptions (stemming)

2. Clustering methods to generate representative sub-sets of e-catalog (currently k-means)

3. Machine learning techniques to train the system and automatically generate ranked classification options (currently Naïve Bayes)

2. The user approves or corrects the proposed classification

3. GoldenBullet constantly learns from the user choices and updates the classification options

© 2002 - 2007 STI Innsbruck  &  Excogito. All Rights Reserved.

Architecture

© 2002 - 2007 STI Innsbruck  &  Excogito. All Rights Reserved.

Mapping the workflow to functional modules:

• Seperation of concerns

• Workflow support to be implemented in the GUI

© 2002 - 2007 STI Innsbruck  &  Excogito. All Rights Reserved.

Architecture

© 2002 - 2007 STI Innsbruck  &  Excogito. All Rights Reserved.

Enhanced Usability and Robustness:

- Provide sort and search functions for catalogue AND classification schema

- Multi-language GUI and contextual help-system

- Support of catalogue sizes of up to 10^6

- Action logging enables undo / redo for classification and user workflow

- Implementation of strategies for the avoidance of over-fitting

© 2002 - 2007 STI Innsbruck  &  Excogito. All Rights Reserved.

Architecture

© 2002 - 2007 STI Innsbruck  &  Excogito. All Rights Reserved.

Enhanced reusability:

- Software can be deployed in a Java Enterprise Edition Application Server (e.g. Tomcat, all major vendors)

-The Java EE XML Registry is instrumented for storing and accessing classification schema data

- Enables customer catalogue taxonomies to be stored and exchanged over a common format.

- Documentation (SW Design, User guide, Feature list), JUnit, JavaDoc

Conclusions and Future

1. GoldenBullet is a semi-automatic product classification system that offers significant reduction of e-catalog classification effort

2. GoldenBullet IC considerably improves (re-) usability and robustness of the system

3. In future we aim at:

1. Implementation & validation of the technical specification

2. Generation of awareness (transIT)

3. Evaluation of further (possibly new) options of marketable exploitation

© 2002 - 2007 STI Innsbruck  &  Excogito. All Rights Reserved.

© 2002 - 2007 STI Innsbruck  &  Excogito. All Rights Reserved.

Online Demo

© 2002 - 2007 STI Innsbruck  &  Excogito. All Rights Reserved.

- Questions so far?

- http://www.gbclass.com

© 2002 - 2007 STI Innsbruck  &  Excogito. All Rights Reserved.

Thank you !

© 2002 - 2007 STI Innsbruck  &  Excogito. All Rights Reserved.

Further Questions?

Backup

The following slides are provided for the case that no internet connection is

available or theDEMO is not reachable

© 2002 - 2007 STI Innsbruck  &  Excogito. All Rights Reserved.

GoldenBullet IC GUI Outline

1. Wizards

1. Data Import/Export

2. Simple and Expert Training

3. Classification

2. E-Catalog and UNSPSC Browsers

© 2002 - 2007 STI Innsbruck  &  Excogito. All Rights Reserved.

“CI” Style

GoldenBullet IC has an integrated GUI style and continuous designed and brand-like Interface.

- Recognition as product

- Usability through commoly used symbols

© 2002 - 2007 STI Innsbruck  &  Excogito. All Rights Reserved.

Data Import/Export Wizards

© 2002 - 2007 STI Innsbruck  &  Excogito. All Rights Reserved.

E-Catalog Browser

© 2002 - 2007 STI Innsbruck  &  Excogito. All Rights Reserved.

Expert Training

Automatically created representative sub-catalog is provided to the userfor semi-automatic classification

© 2002 - 2007 STI Innsbruck  &  Excogito. All Rights Reserved.

Classification

Automatically created classification options are proposed to the user for approval

© 2002 - 2007 STI Innsbruck  &  Excogito. All Rights Reserved.

UNSPSC Browser

The Browser allows the user to locate an appropriate UNSPSC category and manually assign it to a product description

© 2002 - 2007 STI Innsbruck  &  Excogito. All Rights Reserved.