HiTicket Web Service - GitHub PagesMotivation • It‘s so hard to get the Sodagreen’s( ) concert...

21
HiTicket Find Your Second-Hand Tickets Jerry Li 2016/04/18 http://hiticket.tw/

Transcript of HiTicket Web Service - GitHub PagesMotivation • It‘s so hard to get the Sodagreen’s( ) concert...

Page 1: HiTicket Web Service - GitHub PagesMotivation • It‘s so hard to get the Sodagreen’s( ) concert tickets last year. • Tickets sell out in 10 minutes.. • Want to find a way

HiTicketFind Your Second-Hand Tickets

Jerry Li 2016/04/18

http://hiticket.tw/

Page 2: HiTicket Web Service - GitHub PagesMotivation • It‘s so hard to get the Sodagreen’s( ) concert tickets last year. • Tickets sell out in 10 minutes.. • Want to find a way

Motivation

• It‘s so hard to get the Sodagreen’s ( ) concert tickets last year.

• Tickets sell out in 10 minutes..

• Want to find a way to buy Tickets which released by people.

Page 3: HiTicket Web Service - GitHub PagesMotivation • It‘s so hard to get the Sodagreen’s( ) concert tickets last year. • Tickets sell out in 10 minutes.. • Want to find a way

Where to get second-hand tickets

Social MediaFacebook Group、Line

PTTDrama-Ticket

Auction SitesYahoo

Second-Hand Ticket SitesTIXINN Ticketbis CityTalk ...

Page 4: HiTicket Web Service - GitHub PagesMotivation • It‘s so hard to get the Sodagreen’s( ) concert tickets last year. • Tickets sell out in 10 minutes.. • Want to find a way

Approach

• Setup a concert information website。Using concert open data from

• Crawl the posts from PTT Drama-Ticket in every 1 minutes。Information Extraction: ticket type, price, number, seat location…

• Provide a ticket subscription service 。Users will receive Email when there is a specific ticket release.

Page 5: HiTicket Web Service - GitHub PagesMotivation • It‘s so hard to get the Sodagreen’s( ) concert tickets last year. • Tickets sell out in 10 minutes.. • Want to find a way

System Architecture (old)

Database

Email message

Hi Ticket Web

1 min

1 day

DjangoREST server

STMP

Information Extration

Youtube video

Youtube

User

Ptt Web

Page 6: HiTicket Web Service - GitHub PagesMotivation • It‘s so hard to get the Sodagreen’s( ) concert tickets last year. • Tickets sell out in 10 minutes.. • Want to find a way

IDCC Final project - HiTicket

• A website for people to see concert information and find second-hand ticket on PTT.

• Use Python/Django to deploy an ETL system and Website on Google Compute Engine.

Page 7: HiTicket Web Service - GitHub PagesMotivation • It‘s so hard to get the Sodagreen’s( ) concert tickets last year. • Tickets sell out in 10 minutes.. • Want to find a way

Result

Page 8: HiTicket Web Service - GitHub PagesMotivation • It‘s so hard to get the Sodagreen’s( ) concert tickets last year. • Tickets sell out in 10 minutes.. • Want to find a way

‘’

My final project

TicketTW Concerts information web

Second-hand ticket platform

Page 9: HiTicket Web Service - GitHub PagesMotivation • It‘s so hard to get the Sodagreen’s( ) concert tickets last year. • Tickets sell out in 10 minutes.. • Want to find a way

‘’Redesign HiTicket System

1. Modify Extraction Pipeline2. Rule-based Extraction3. Web UI Upgrade

Page 10: HiTicket Web Service - GitHub PagesMotivation • It‘s so hard to get the Sodagreen’s( ) concert tickets last year. • Tickets sell out in 10 minutes.. • Want to find a way

System Architecture (new)

ETL Database

CrawlerInformation Extraction

PTT

Web Database

CityTalk Check alive

Post resource

10 min

1 day

Concerts

Posts

Concert resource

Wikipedia

TicketTW

Official website

manually

Page 11: HiTicket Web Service - GitHub PagesMotivation • It‘s so hard to get the Sodagreen’s( ) concert tickets last year. • Tickets sell out in 10 minutes.. • Want to find a way

Ticket Post Extraction Pipeline

Concert Detection

Price ExtractionNumber of Tickets Extraction

Type Extraction

Posts from PTT and CityTalk by Crawlers

Number of Tickets Correction

Database

Content Segmentation

Words Normalization

Price Filter

Structure Data

Page 12: HiTicket Web Service - GitHub PagesMotivation • It‘s so hard to get the Sodagreen’s( ) concert tickets last year. • Tickets sell out in 10 minutes.. • Want to find a way

Example: Posts from PTT

Content Segmentation Type Extraction Concert Detection

authortype, titletime

source, url

raw messagepricenumber

Page 13: HiTicket Web Service - GitHub PagesMotivation • It‘s so hard to get the Sodagreen’s( ) concert tickets last year. • Tickets sell out in 10 minutes.. • Want to find a way

Rule-based Extraction

Words Normalization Change to digital number

ㄧ張、乙張、單張、兩張、1000元…

1張、2張、1000元…

Price Extraction From Title and Raw message Pattern match: 售價(.*), 票價(.*), 原價(.*)… Parse numbers

票價:1500*2+限時掛號費 => 1500|2

Compare with official price [800,1500,2000,…]

Number Extraction From Title and Raw message Pattern match: (.*)張,各(.*)張,多張, 張數(.*), 數量(.*)…

Page 14: HiTicket Web Service - GitHub PagesMotivation • It‘s so hard to get the Sodagreen’s( ) concert tickets last year. • Tickets sell out in 10 minutes.. • Want to find a way

4446

from PTT Drama-Ticket and CityTalk

Posts

2016/03/07 – 2016/03/27 20 days

Page 15: HiTicket Web Service - GitHub PagesMotivation • It‘s so hard to get the Sodagreen’s( ) concert tickets last year. • Tickets sell out in 10 minutes.. • Want to find a way

797/4446Valid Posts/ Total Posts

94.3%Number Recall

78.9%Price Recall

97.6%Price Precision

Number Precision93.1%

F1 Score

F1 Score

87.2%

93.6%

Page 16: HiTicket Web Service - GitHub PagesMotivation • It‘s so hard to get the Sodagreen’s( ) concert tickets last year. • Tickets sell out in 10 minutes.. • Want to find a way

Detection failed example

Page 17: HiTicket Web Service - GitHub PagesMotivation • It‘s so hard to get the Sodagreen’s( ) concert tickets last year. • Tickets sell out in 10 minutes.. • Want to find a way

Posts on FB group

Page 18: HiTicket Web Service - GitHub PagesMotivation • It‘s so hard to get the Sodagreen’s( ) concert tickets last year. • Tickets sell out in 10 minutes.. • Want to find a way

Discussion

Manually update concert database Concert information will not frequently change It is not a unsupervised approach

High Detected Rate? Posts from PTT mostly follow the rules. Can’t handle multiple tickets in same post. Can’t handle the unstructured post on Facebook Group and CityTalk.

Value Provide a platform for user to find second-hand ticket. Can find out and filter scalped tickets.

Page 19: HiTicket Web Service - GitHub PagesMotivation • It‘s so hard to get the Sodagreen’s( ) concert tickets last year. • Tickets sell out in 10 minutes.. • Want to find a way

Web UI

Bootstrap/Bootswatch, Font awesome, Google fonts, Pinterest-style layout, Colorbox…

Page 20: HiTicket Web Service - GitHub PagesMotivation • It‘s so hard to get the Sodagreen’s( ) concert tickets last year. • Tickets sell out in 10 minutes.. • Want to find a way

Example:

Page 21: HiTicket Web Service - GitHub PagesMotivation • It‘s so hard to get the Sodagreen’s( ) concert tickets last year. • Tickets sell out in 10 minutes.. • Want to find a way

Q&A