Presentation

40
Presented by Sonam (10103470)

description

Predictions on Stack overflow

Transcript of Presentation

Page 1: Presentation

Presented bySonam (10103470)

Page 2: Presentation

Stack Overflow is a question andanswer site for professional andenthusiast programmers.

Page 3: Presentation

Tags are user-generated labels/keywords

for entities that summarize the features of

the questions from different views

Page 4: Presentation

Questions that are not related to programming

topics are marked ‘closed’ by experienced users

and community moderators

Page 5: Presentation

Questions that are

deleted/locked by

experienced users and

community

moderators

Page 6: Presentation

•Tag recommendation to questions being posted on Stack Overflow

•Prediction of ‘closed’ question at post creation time

•Prediction of ‘deleted’ question after deletion

Page 7: Presentation

•Easier question posting

•Better organization of the site

Page 8: Presentation

•Feedback to question asker

•Community moderator assistance

Page 9: Presentation

•Feedback to Moderator/owner

•Whether it should worth deletion or remain undeleted

Page 10: Presentation
Page 11: Presentation
Page 12: Presentation

Database Snapshot

Page 13: Presentation
Page 14: Presentation

•TF.IDF WEIGHTING•NAÏVE BAYES CLASSIFICATION•SVM CLASSIFICATION•K- NEAREST NEIGHBOR CLASSIFICATION

Page 15: Presentation

•Flow chart of tag prediction

Page 16: Presentation

Following graph shows the comparison of accuracy with andwithout feedback.

Page 17: Presentation

Represents accuracies corresponding to each post for therecommendation of 1 tag,2 tags, top 3, top 4, and top5 tags

Page 18: Presentation

Accuracies of full system for Tag reccomending system with the variation of tags

Page 19: Presentation
Page 20: Presentation

•RANDOM FOREST CLASSIFIER•ADABOOST CLASSIFIER•EXTRATREES CLASSIFIER

Page 21: Presentation

•Score of post• User’s reputation• Age of user account• Score of other posts of user• Post content

Page 22: Presentation

•Flow chart of Closed Question

Page 23: Presentation

Graph shows the importance of different features basis on Random Forest

Page 24: Presentation

Graph shows the importance of different features basis on AdaBoost

Page 25: Presentation

Graph shows the importance of different features basis on ExtraTrees

Page 26: Presentation

Following graph shows the comparison of accuracy with different number of features

Page 27: Presentation

Following graph shows the comparison of accuracy with different number of estimators

Page 28: Presentation

Comparison between three classifiers

On the basis of closed question found:

Page 29: Presentation

Accuracy comparison:

On the basis of estimators:

Page 30: Presentation

Accuracy comparison:

On the basis of changing training set count :

Page 31: Presentation
Page 32: Presentation

•RANDOM FOREST CLASSIFIER•ADABOOST CLASSIFIER•EXTRATREES CLASSIFIER

Page 33: Presentation

•Score of post• User’s reputation• Age of user account• Score of other posts of user• Post content

Page 34: Presentation

•Flow chart of Deletion

Page 35: Presentation

Though deleted questions are mostly on relevant. These are removed by reputed

authors which do this for saving their reputation on stack overflow.

Page 36: Presentation

Accuracy comparison:

On the basis of estimators:

Page 37: Presentation

Accuracy comparison:

On the basis of changing training set count :

Page 38: Presentation

•Tag recommendation has been implemented with and without feedback. We found that we achieve better accuracy with feedback.

•‘Closed’ question prediction has been implemented with three different classifiers and along with different number of features and estimators. We found that we achieve better accuracy in Adaboost.

•Same for deleted questions we found with all three classifiers and resulted that many questions are worth deletion but some require to get back.

Page 39: Presentation

• Increase the accuracy of our algorithms.

• Predicting the trend on stack overflow.

•Predicting & finding the unanswered question.

•Predict the quality of answers with non textual features.

Page 40: Presentation