Presentation

Post on 06-Jul-2015

76 views 1 download

Tags:

description

Predictions on Stack overflow

Transcript of Presentation

Presented bySonam (10103470)

Stack Overflow is a question andanswer site for professional andenthusiast programmers.

Tags are user-generated labels/keywords

for entities that summarize the features of

the questions from different views

Questions that are not related to programming

topics are marked ‘closed’ by experienced users

and community moderators

Questions that are

deleted/locked by

experienced users and

community

moderators

•Tag recommendation to questions being posted on Stack Overflow

•Prediction of ‘closed’ question at post creation time

•Prediction of ‘deleted’ question after deletion

•Easier question posting

•Better organization of the site

•Feedback to question asker

•Community moderator assistance

•Feedback to Moderator/owner

•Whether it should worth deletion or remain undeleted

Database Snapshot

•TF.IDF WEIGHTING•NAÏVE BAYES CLASSIFICATION•SVM CLASSIFICATION•K- NEAREST NEIGHBOR CLASSIFICATION

•Flow chart of tag prediction

Following graph shows the comparison of accuracy with andwithout feedback.

Represents accuracies corresponding to each post for therecommendation of 1 tag,2 tags, top 3, top 4, and top5 tags

Accuracies of full system for Tag reccomending system with the variation of tags

•RANDOM FOREST CLASSIFIER•ADABOOST CLASSIFIER•EXTRATREES CLASSIFIER

•Score of post• User’s reputation• Age of user account• Score of other posts of user• Post content

•Flow chart of Closed Question

Graph shows the importance of different features basis on Random Forest

Graph shows the importance of different features basis on AdaBoost

Graph shows the importance of different features basis on ExtraTrees

Following graph shows the comparison of accuracy with different number of features

Following graph shows the comparison of accuracy with different number of estimators

Comparison between three classifiers

On the basis of closed question found:

Accuracy comparison:

On the basis of estimators:

Accuracy comparison:

On the basis of changing training set count :

•RANDOM FOREST CLASSIFIER•ADABOOST CLASSIFIER•EXTRATREES CLASSIFIER

•Score of post• User’s reputation• Age of user account• Score of other posts of user• Post content

•Flow chart of Deletion

Though deleted questions are mostly on relevant. These are removed by reputed

authors which do this for saving their reputation on stack overflow.

Accuracy comparison:

On the basis of estimators:

Accuracy comparison:

On the basis of changing training set count :

•Tag recommendation has been implemented with and without feedback. We found that we achieve better accuracy with feedback.

•‘Closed’ question prediction has been implemented with three different classifiers and along with different number of features and estimators. We found that we achieve better accuracy in Adaboost.

•Same for deleted questions we found with all three classifiers and resulted that many questions are worth deletion but some require to get back.

• Increase the accuracy of our algorithms.

• Predicting the trend on stack overflow.

•Predicting & finding the unanswered question.

•Predict the quality of answers with non textual features.