Link Anamoly DEtection

62
Table of contents Sno TITLE Page No’s 1 ABSTRACT…………………………….. 2 OVERVIEW……………………….. 2.1 Purpose of the project……………. 2.2 Existing system……………………… 2.3 Proposed system……………………. 3 REQUIREMENT SPECIFICATION............. 3.1 Hardware requirements………………… 3.2 Software requirements…………………… 4 FEASIBILITY STUDY……………………..

description

Emerging topics in social media has become very important aspect so.So the Link Anamoly Detection is a method which is used to find out the emerging topic.

Transcript of Link Anamoly DEtection

Page 1: Link Anamoly DEtection

Table of contents

Sno TITLE Page No’s

1 ABSTRACT……………………………..

2 OVERVIEW…………………………..

2.1 Purpose of the project…………….

2.2 Existing system………………………

2.3 Proposed system…………………….

3 REQUIREMENT SPECIFICATION.............3.1 Hardware requirements…………………

3.2 Software requirements……………………

4 FEASIBILITY STUDY……………………..

4.1 Technical feasibility……………………….

4.2 Operational feasibility……………………

4.3 Economic feasibility…

5 LANGUAGE SPECIFICATION………………

5.1 Introductio to JAVA

5.2 JavaScript……………………………………

5.3 JSP……………………………

Page 2: Link Anamoly DEtection

5.4 Servlet…………………………………………

5.5 MySQL Database……………………………

5.6 Net beans……………………………………...

5.7 Apache Tomcat………………………………

5.8 Glassfish……………………………………….5.9 Web application……………………………

6 SYSTEM DESIGN……………………………………..

6.1 System Architecture…………………..

6.2 Data flow diagrams………………………

6.3 E-R Diagrams………………………………

6.4 UML Diagrams………………………

7 SYSTEMDESCRIPTION…………………………………

8 CODING…………………………………………………………

9 SYSTEM TESTING………………………………………….

9.1 Introduction to Testing………………………….. 9.2 Test Cases……………………………………………..

10 OUTPUT SCREENS…………………………………………

11 CONCLUSION…………………………………………………

12 BIBLIOGRAPHY……………………………………………..

Page 3: Link Anamoly DEtection

1.ABSTRACT

Detection of emerging topics is now receiving renewed interest

motivated by the rapid growth of social networks. Conventional-term-

frequency-based approaches may not be appropriate in this context,

because the information exchanged in social-network posts include not

only text but also images, URLs, and videos. These projects focus on

emergence of topics signalled by social aspects of these networks.

Specifically, project focus on mentions of user links between users that

are generated dynamically through replies, mentions, and retweets.

These projects recommend a probability model of the mentioning

behaviour of a social network user, and recommend detecting the

emergence of a new topic from the anomalies measured through the

model. Aggregating anomaly scores from hundreds of users, and this

project can detect emerging topics only based on the reply/mention

relationships in social-network posts. The recommend project show that

the recommend mention anomaly based approaches can detect new

topics at least as early as text-anomaly-based approaches, and in some

cases much earlier when the topic is poorly identified by the textual

contents in posts.

Page 4: Link Anamoly DEtection

2. OVERVIEW

Overall description consists of background of the entire specific requirement. It also gives explanation about actor which is used. It gives explanation about architecture diagram and it also gives what we are assumed and dependencies. It also support specific requirement and also it support functional requirement, supplementary requirement other than actor which is used.

2.1 PURPOSE OF THE PROJECT

Communication over social networks, such as Facebook and Twitter, is gaining its importance in our daily life. Since the information exchanged over social networks are not only texts but also URLs, images, and videos, they are challenging test beds for the study of data mining. In particular, we are interested in the problem of detecting emerging topics from social streams, which can be used to create automated “breaking news”, or discover hidden market needs or underground political movements. Compared to conventional media, social media are able to capture the earliest, unedited voice of ordinary people. Therefore, the challenge is to detect the emergence of a topic as early as possible at a moderate number of false positives.

Page 5: Link Anamoly DEtection

2.2 EXISTING SYSTEM

Emerging topic is something people feel like discussing, commenting, or

forwarding the information further to their friends. Conventional

approaches for topic detection have mainly been concerned with the

frequencies of textual words.

DISADVANTAGES OF EXISTING SYSTEM:

A term-frequency-based approach could suffer from the ambiguity

caused by synonyms or homonyms.

It may also require complicated pre-processing depending on the

target language.

Moreover, it cannot be applied when the contents of the messages

are mostly non-textual information.

On the other hand, the “words” formed by mentions are unique,

require little pre-processing to obtain and are available regardless

of the nature of the contents.

Page 6: Link Anamoly DEtection

PROPOSED SYSTEM:

Recommended system proposed a new approach to detect the

emergence of topics in a social network stream.

The basic idea of this project is to focus on the social aspect of the

posts reflected in the mentioning behaviour of users instead of the

textual contents.

There is a probability model that captures both the number of

mentions per post and the frequency of mentionee.

ADVANTAGES OF PROPOSED SYSTEM:

The recommended method does not rely on the textual contents of

social network posts, it is robust to rephrasing and it can be applied

to the case where topics are concerned with information other than

texts, such as images, video, audio, and so on.

The link-anomaly-based methods performed even better than the

keyword-based methods on “NASA” and “BBC” data sets.

Page 7: Link Anamoly DEtection

3.REQUIREMENT SPECIFICATION

3.1 HARDWARE REQUIREMENTS

The hardware used for the development of project is:

System : Pentium IV 2.4 GHz.

Hard Disk : 40 GB.

Floppy Drive : 1.44 Mb.

Monitor : 15 VGA Colour.

Mouse : Logitech.

Ram : 512 Mb.

3.2 SOFTWARE REQUIREMENTS

The software used for the development of project is:

Operating system : Windows XP/7.

Language : JAVA

Front End : Jsp,Servlet,JavaScript

IDE : Netbeans 7.0

Application Server : Apache Tomcat 7.0/Glassfish

Back End : MYSQL 5.5

Page 8: Link Anamoly DEtection

4. FEASIBILITY STUDY

Feasibility study is a process which defines exactly what a project is

and what strategic issues need to be considered to assess its feasibility,

or likelihood of succeeding. Feasibility studies are useful both when

starting a new business, and identifying a new opportunity for an

existing business. Ideally, the feasibility study process involves making

rational decisions about a number of enduring characteristics of a

project, including:

Technical feasibility- do we’ have the technology’? If not, can we

get it?

Operational feasibility- do we have the resources to build the

system? Will the system be acceptable? Will people use it?

Economic feasibility, technical feasibility, schedule feasibility, and

operational feasibility- are the benefits greater than the costs?

4.1 TECHNICAL FEASIBILITY

Technical feasibility is concerned with the existing computer

system (Hardware, Software etc.) and to what extend it can support the

proposed addition. For example, if particular software will work only in

a computer with a higher configuration, an additional hardware is

Page 9: Link Anamoly DEtection

required. This involves financial considerations and if the budget is a

serious constraint, then the proposal will be considered not feasible.

4.2 OPERATIONAL FEASIBILITY

Operational feasibility is a measure of how well a proposed system

solves the problems, and takes advantages of the opportunities identified

during scope definition and how it satisfies the requirements identified

in the requirements identified in the requirements analysis phase of

system development.

4.3 ECONOMIC FEASIBILITY

Economic analysis is the most frequently used method for

evaluating the effectiveness of a candidate system. More commonly

known as cost/ benefit analysis, the procedure is to determine the

benefits and savings that are expected from a candidate system and

compare them with costs. If benefits outweigh costs, then the decision

is made to design and implement the system.

Page 10: Link Anamoly DEtection

5. LANGUAGE SPECIFICATIONS

5.1 INTRODUCTION TO JAVA:

Java is a general-purpose computer programming language that is

concurrent, class-based, object-oriented, and specifically designed to

have as few implementation dependencies as possible. It is intended to

let application developers "write once, run anywhere", meaning that

code that runs on one platform does not need to be recompiled to run on

another. Java applications are typically compiled to byte code that can

run on any Java virtual machine (JVM) regardless of computer

architecture. Java is, as of 2014, one of the most popular programming

languages in use, particularly for client-server web applications, with a

reported 9 million developers. Java was originally developed by James

Gosling at Sun Microsystems and released in 1995 as a core component

of Sun Microsystems' Java platform. The language derives much of its

syntax from C and C++, but it has fewer low-level facilities than either

of them.

The original and reference implementation Java compilers, virtual

machines, and class libraries were originally released by Sun under

proprietary licences. As of May 2007, in compliance with the

specifications of the Java Community Process, Sun relicensed most of its

Java technologies under the GNU General Public License.

Page 11: Link Anamoly DEtection

The Java compiler

When you program for the Java platform, you write source code

in .java files and then compile them. The compiler checks your code

against the language's syntax rules, then writes out byte codes in .class

files. Byte codes are standard instructions targeted to run on a Java

virtual machine. In adding this level of abstraction, the Java compiler

differs from other language compilers, which write out instructions

suitable for the CPU chipset the program will run on.

Page 12: Link Anamoly DEtection

The JVM

At run time, the JVM reads and interprets .class files and executes

the program's instructions on the native hardware platform for which the

JVM was written. The JVM interprets the byte codes just as a CPU

would interpret assembly-language instructions. The difference is that

the JVM is a piece of software written specifically for a particular

platform. The JVM is the heart of the Java language's "write-once, run-

anywhere" principle. Your code can run on any chipset for which a

suitable JVM implementation is available. JVMs are available for major

platforms like Linux and Windows, and subsets of the Java language

have been implemented in JVMs for mobile phones and hobbyist chips.

The Garbage Collector

Rather than forcing you to keep up with memory allocation the

Java platform provides memory management out of the box. When your

Java application creates an object instance at run time, the JVM

automatically allocates memory space for that object from the heap,

which is a pool of memory set aside for your program to use. The Java

garbage collector runs in the background, keeping track of which objects

the application no longer needs and reclaiming memory from them. This

approach to memory handling is called implicit memory management

because it doesn't require you to write any memory-handling code.

Page 13: Link Anamoly DEtection

Garbage collection is one of the essential features of Java platform

performance.

The Java Development Kit

When you download a Java Development Kit you get in addition

to the compiler and other tools a complete class library of prebuilt

utilities that help you accomplish just about any task common to

application development. The best way to get an idea of the scope of the

JDK packages and libraries is to check out the JDK API documentation.

The Java Runtime Environment

The Java Runtime Environment includes the JVM, code libraries,

and components that are necessary for running programs written in the

Java language. It is available for multiple platforms. You can freely

redistribute the JRE with your applications, according to the terms of the

JRE license, to give the application's users a platform on which to run

your software. The JRE is included in the JDK.

Features Of Java Language

Java has so many features which are as follows:

Java is Simple

There are various features that makes the java as a simple

language. because Java is easy to learn and developed by taking the best

features from other languages mainly like C and C++. It is very easy to

learn Java who have knowledge of object oriented programming

Page 14: Link Anamoly DEtection

concepts. Java provides the error free development environment for

programmer because it provide automatic memory management by

development environment and eliminate pointers.

Java is Platform Independent

Java provides the facility to "Write once -Run anywhere". Not

even a single language is idle to this feature but java is closer to this

feature. Java Provide the facility of cross-platform programs by

compiling in intermediate code known as byte code. This byte code can

be interpreted on any system which has Java Virtual Machine.

Java is Object-oriented

The object oriented language must support the characteristics of

the OOPs. And Java is a fully object oriented language. it supports all

the characteristics needed to be object oriented. In the Java everything is

treated as objects to which methods are applied. As the languages like

Objective C, C++ fulfills the above four characteristics yet they are not

fully object oriented languages because they are structured as well as

object oriented languages. But in case of java, it is a fully Object

Oriented language because object is at the outer most level of data

structure in java. No stand alone methods, constants, and variables are

there in java. Everything in java is object even the primitive data types

can also be converted into object by using the wrapper class.

Page 15: Link Anamoly DEtection

`Java is distributed

The widely used protocols like HTTP and FTP are

developed in java. Internet programmers can call functions on these

protocols and can get access the files from any remote machine on the

internet rather than writing codes on their local system.

Java is Secure

Java does not use memory pointers explicitly. All the programs in

java are run under an area known as the sand box. Security manager

determines the accessibility options of a class like reading and writing a

file to the local disk. Java uses the public key encryption system to allow

the java applications to transmit over the internet in the secure encrypted

form. The bytecode Verifier checks the classes after loading.

1. No memory pointers

2. Programs run inside the virtual machine sandbox.

3. Array index limit checking

Java is compiled and interpreted

We all know that in Java code is compiled to byte codes that are interpreted

by Java virtual machines (JVM). This provides portability to any machine for

which a virtual machine has been written. The interpreter program reads the source

code and translates it on the fly into computations. The two steps of compilation

and interpretation allow for extensive code checking and improved security.

Page 16: Link Anamoly DEtection

Java is Robust

Java has the strong memory allocation and automatic garbage

collection mechanism. It carries out type checking at both compile and

runtime making sure that every data structure has been clearly defined

and typed. compiler checks the program for any error and interpreter

checks any run time error that every data structure is clearly defined and

typed. Java manages the memory automatically by using an automatic

garbage collector. All the above features make Java language robust.

Java is Portable

The feature of java "write once -run any where" make java

portable. Many type of computers and operating systems are used for

programs By porting an interpreter for the Java Virtual Machine to any

computer hardware/operating system, one is assured that all code

compiled for it will run on that system. This forms the basis for Java's

portability.

5.2 JavaScript:

A dynamic computer programming language. It is most commonly

used as part of web browsers, whose implementations allow client-side

scripts to interact with the user, control the browser, communicate

asynchronously, and alter the document content that is displayed. It is

also used in server-side network programming with frameworks such as

Page 17: Link Anamoly DEtection

Node.js, game development and the creation of desktop and mobile

applications.

JavaScript is classified as a prototype-based scripting language with

dynamic typing and first-class functions. This mix of features makes it a

multi-paradigm language, supporting object-oriented, imperative, and

functional programming styles.

Despite some naming, syntactic, and standard library similarities,

JavaScript and Java are otherwise unrelated and have very different

semantics. The syntax of JavaScript is actually derived from C, while the

semantics and design are influenced by Self and Scheme programming

languages. JavaScript is also used in environments that aren't web-based,

such as PDF documents, site-specific browsers, and desktop widgets.

Newer and faster JavaScript virtual machines and platforms built upon

them have also increased the popularity of JavaScript for server-side

web applications. On the client side, JavaScript has been traditionally

implemented as an interpreted language, but more recent browsers

perform just-in-time compilation.

Page 18: Link Anamoly DEtection

5.3 JSP

JavaServer Pages (JSP) is a server-side programming technology

that enables the creation of dynamic, platform-independent method for

building Web-based applications. JSP have access to the entire family of

Java APIs, including the JDBC API to access enterprise databases.JSP

may be viewed as a high-level abstraction of Java servlets. JSPs are

translated into servlets at runtime; each JSP servlet is cached and re-used

until the original JSP is modified. JSP can be used independently or as

the view component of a server-side model–view–controller design,

normally with JavaBeans as the model and Java as the controller. This is

a type of Model 2 architecture.JSP allows Java code and certain pre-

defined actions to be interleaved with static web markup content, with

the resulting page being compiled and executed on the server to deliver a

document. The compiled pages, as well as any dependent Java libraries,

use Java byte code rather than a native software format. Like any other

Java program, they must be executed within a Java virtual machine that

integrates with the server's host operating system to provide an abstract

platform-neutral environment.

Page 19: Link Anamoly DEtection

JSPs are usually used to deliver HTML and XML documents, but

through the use of OutputStream, they can deliver other types of data as

well.

The Web container creates JSP implicit objects like pageContext,

servletContext, session, request & response.

A JavaServer Pages compiler is a program that parses JSPs, and

transforms them into executable Java Servlets. A program of this type is

usually embedded into the application server and run automatically the

first time a JSP is accessed, but pages may also be precompiled for

better performance, or compiled as a part of the build process to test for

errors.

Page 20: Link Anamoly DEtection

5.4 Servlet:

A Servlet is basically a Java Program that executes within a Web server

or an Application Server, acting as a middle layer between requests sent

from a web client and a database on the HTTP server. By use of

Servlets, you can dynamically come up with web pages, obtain

information from users through web forms and display records from a

database.

Servlets are most often used to:

1. Process or store data that was submitted from an HTML form.

2. Provide dynamic content such as the results of a database query

3. Manage state information that does not exist in the stateless HTTP

protocol, such as filling the articles into the shopping cart of the

appropriate customer.

With that in mind, a Servlet is a Java class that complies to the Java

Servlet API. This API is the standard for executing Java classes that

Page 21: Link Anamoly DEtection

respond to requests. Javax.servlet.http is a package that specifies HTTP

specific subclasses for the communication of the Servlet and the Servlet

container. Therefore, you can use a Servlet to establish dynamic content

to a web server through the Java platform. The dynamic content

generated is usually HTML but it may be in other forms such as XML.

Servlets can also be used to maintain state in session variable through

the use of HTTP cookies or URL rewriting. Servlets are usually

packaged in a WAR file.

5.5 MySQL Database

MySQL is the most popular Open Source Relational SQL

database management system. MySQL is one of the best RDBMS being

used for developing web-based software applications.

A Relational Database Management System is software that:

Enables you to implement a database with tables, columns and

indexes.

Guarantees the Referential Integrity between rows of various

tables.

Updates the indexes automatically.

Interprets an SQL query and combines information from various

tables.

Page 22: Link Anamoly DEtection

RDBMS Terminology:

Before we proceed to explain MySQL database system, let's revise few

definitions related to database.

Database: A database is a collection of tables, with related data.

Table: A table is a matrix with data. A table in a database looks

like a simple spreadsheet.

Column: One column (data element) contains data of one and the

same kind, for example the column postcode.

Row: A row (= tuple, entry or record) is a group of related data, for

example the data of one subscription.

Redundancy: Storing data twice, redundantly to make the system

faster.

Primary Key: A primary key is unique. A key value cannot occur

twice in one table. With a key, you can find at most one row.

Foreign Key: A foreign key is the linking pin between two tables.

Compound Key: A compound key (composite key) is a key that

consists of multiple columns, because one column is not

sufficiently unique.

Index: An index in a database resembles an index at the back of a

book.

Referential Integrity: Referential Integrity makes sure that a

foreign key value always points to an existing row.

Page 23: Link Anamoly DEtection

MySQL is a fast, easy-to-use RDBMS being used for many small

and big businesses. MySQL is developed, marketed, and supported by

MySQL AB, which is a Swedish company. MySQL is becoming so

popular because of many good reasons:

MySQL is released under an open-source license. So you have

nothing to pay to use it.

MySQL is a very powerful program in its own right. It handles a

large subset of the functionality of the most expensive and

powerful database packages.

MySQL uses a standard form of the well-known SQL data

language.

MySQL works on many operating systems and with many

languages including JAVA, etc.

MySQL works very quickly and works well even with large data

sets.

MySQL is very friendly to PHP, the most appreciated language for

web development.

MySQL supports large databases, up to 50 million rows or more in

a table. The default file size limit for a table is 4GB, but you can

increase this.

MySQL is customizable. The open-source GPL license allows

programmers to modify the MySQL software to fit their own

specific environments.

Page 24: Link Anamoly DEtection

Net beans:

NetBeans is an integrated development environment for

developing primarily with Java, but also with other languages, in

particular PHP, C/C++, and HTML5.It is also an application

platform framework for Java desktop applications and others.

The NetBeans IDE is written in Java and can run on Windows, OS

X, Linux, Solaris and other platforms supporting a compatible

JVM. The NetBeans Platform allows applications to be developed

from a set of modular software components called modules.

Applications based on the NetBeans can be extended by third party

developers. The NetBeans Team actively support the product and

seek feature suggestions from the wider community.

Apache Tomcat:

Apache Tomcat is an open source web server and servlet container

developed by the Apache Software Foundation. Tomcat

implements several Java EE specifications including Java Servlet,

JavaServer Pages (JSP), Java EL, and WebSocket, and provides a

"pure Java" HTTP web server environment for Java code to run in.

Page 25: Link Anamoly DEtection

Glassfish:

GlassFish is an open-source application server project started by

Sun Microsystems for the Java EE platform and now sponsored by

Oracle Corporation. The supported version is called Oracle GlassFish

Server.

GlassFish is the reference implementation of Java EE and as such

supports Enterprise JavaBeans, JPA, JavaServer Faces, JMS, RMI,

JavaServer Pages, servlets, etc. This allows developers to create

enterprise applications that are portable and scalable, and that integrate

with legacy technologies. Optional components can also be installed for

additional services.

Web Application:

It has also added user- as well as system-based web applications

enhancement to add support for deployment across the variety of

environments. It also tries to manage sessions as well as applications

across the network.

Tomcat is building additional components. A number of additional

components may be used with Apache Tomcat. These components may

be built by users should they need them or they can be downloaded from

one of the mirrors.

Page 26: Link Anamoly DEtection

6. SYSTEM DESIGN

6.1 SYSTEM ARCHITECTURE

Page 27: Link Anamoly DEtection

6.2 DATA FLOW DIAGRAMS

LEVEL 0:

Page 28: Link Anamoly DEtection

LEVEL 1:

LEVEL 2:

Page 29: Link Anamoly DEtection

6.3 ER DIAGRAM

Page 30: Link Anamoly DEtection

Username Password

Admin Find Emerging Topic

Detect anomaly

Post Content

Comment ID

Comment

Post Date

Comments

Username

Post Content

Post ID

PostPost Date

Post ContentPost ID

Post

Post Date

Make Friend

Write Post

UserNameFriend

Friend name

Friend ID

Username

Username

Password

Email

Page 31: Link Anamoly DEtection

Use case diagram:

Page 32: Link Anamoly DEtection

Class diagram:

Page 33: Link Anamoly DEtection

Sequence diagram:

Page 34: Link Anamoly DEtection

Activity diagram:

Page 35: Link Anamoly DEtection

7. SYSTEM DESCRIPTION

Event Detection Streams Event Description Module User Profiling In Social Media Kleinberg’s Burst-Detection Method Data Set.

1. Event Detection Streams

Microblogs have become an important source for reporting real-world

events. A real-world occurrence reported in microblogs is also called a

social event. Social events may hold critical materials that describe the

situations during a crisis. In real applications, such as crisis management

and decision making, monitoring the critical events over social streams

will enable watch officers to analyze a whole situation that is a

composite event, and make the right decision based on the detailed

contexts such as what is happening, where an event is happening, and

who are involved. Although there has been significant research effort on

detecting a target event in social networks based on a single source, in

crisis, we often want to analyze the composite events contributed by

different social users. So far, the problem of integrating ambiguous

views from different users is not well investigated. To address this issue,

we propose a novel framework to detect composite social events over

streams, which fully exploits the information of social data over multiple

Page 36: Link Anamoly DEtection

dimensions. Specifically, we first propose a graphical model called

location-time constrained topic (LTT) to capture the content, time, and

location of social messages. Using LTT, a social message is represented

as a probability distribution over a set of topics by inference, and the

similarity between two messages is measured by the distance between

their distributions. Then, the events are identified by conducting efficient

similarity joins over social media streams. To accelerate the similarity

join, we also propose a variable dimensional extendible hash over social

streams. We have conducted extensive experiments to prove the high

effectiveness and efficiency of the proposed approach.

2. Event description module

The rise of Social Media services in the last years has created huge

streams of information that can be very valuable in a variety of

scenarios. What precisely these scenarios are and how the data streams

can efficiently be analyzed for each scenario is still largely unclear at

this point in time and has therefore created significant interest in

industry and academia. In this paper, we describe a novel algorithm for

geo-spatial event detection on Social Media streams. We monitor all

posts on Twitter issued in a given geographic region and identify places

that show a high amount of activity. In a second processing step, we

analyze the resulting spatio-temporal clusters of posts with a

Page 37: Link Anamoly DEtection

MachineLearning component in order to detect whether they constitute

real-world events or not. We show that this can be done with high

precision and recall. The detected events are finally displayed to a user

on a map, at the location where they happen and while they happen.

3. User profiling in social media

A user profile is a visual display of personal data associated with a

specific user, or a customized desktop environment. A profile refers

therefore to the explicit digital representation of a person's identity. A

user profile can also be considered as the computer representation of

user .A profile can be used to store the description of the characteristics

of person. This information can be exploited by systems taking into

account the persons' characteristics and preferences. Profiling is the

process that refers to construction of a profile via the extraction from a

set of data. User profiles can be found on operating systems, computer

programs, recommender systems, or dynamic websites (such as online

social networking sites or bulletin boards).

A social networking service is a platform to build social

networks or social relations among people who share interests, activities,

backgrounds or real-life connections. A social network service consists

of a representation of each user (often a profile), his or her social links,

and a variety of additional services. Social networks are web-based

services that allow individuals to create a public profile, to create a list

Page 38: Link Anamoly DEtection

of users with whom to share connections, and view and cross the

connections within the system. Most social network services are web-

based and provide means for users to interact over the Internet, such

as e-mail and instant messaging. Social network sites are varied and they

incorporate new information and communication tools such as mobile

connectivity, photo/video/sharing and blogging. Online

community services are sometimes considered as a social network

service, though in a broader sense, social network service usually means

an individual-centered service whereas online community services are

group-centered. Social networking sites allow users to share ideas,

pictures, posts, activities, events, interests with people in their network.

A social network is a social structure made up of a set

of social actors (such as individuals or organizations) and a set of

the dyadic ties between these actors. The social network perspective

provides a set of methods for analyzing the structure of whole social

entities as well as a variety of theories explaining the patterns observed

in these structures.[1] The study of these structures uses social network

analysis to identify local and global patterns, locate influential entities,

and examine network dynamics.

4. Kleinberg’s Burst-Detection Method

In addition to the change-point detection based on SDNML

followed by DTO described in previous sections, we also test the

Page 39: Link Anamoly DEtection

combination of our method with Kleinberg’s burst-detection method.

More specifically, we implemented a two-state version of Kleinberg’s

burst detection model. The reason we chose the two-state version was

because in this experiment we expect no The proposed link-anomaly-

based change-point detection is highly scalable. Every step described in

the previous subsections requires only linear time against the length of

the analyzed time period. Computation of the predictive distribution for

the number of mentions can be computed in linear time against the

number of mentions. Computation of the predictive distribution for the

mention probability and can be efficiently performed using a hash table.

Aggregation of the anomaly scores from different users takes linear time

against the number of users, which could be a computational bottle neck

but can be easily parallelized. SDNML-based change-point detection

requires two swipes over the analyzed time period. Kleinberg’s burst-

detection method can be efficiently implemented with dynamic

programming.

5. Data set.

This data set is related to the recent leakage of some confidential

video by the Japan Coastal Guard officer. The keyword used in the

keyword-based methods was “Senkaku.” the results of link-anomaly

based change detection and burst detection, respectively. Text-anomaly-

based change detection and burst detection, respectively. This data set is

related to a controversial post by a famous person in Japan that “the

Page 40: Link Anamoly DEtection

reason students having difficulty finding jobs is, because they are

stupid” and various replies to that post. The keyword used in the

keyword-based methods was “Job hunting.” The four data sets we

collected are called “Job hunting”, “Youtube”, “NASA”, “BBC” and

each of them corresponds to a user organized list in Togetter.

For each list, we extracted a list of Twitter users that appeared in

the list, and collected Twitter posts from those users. Number of

participants and the number of posts we collected for each data set. Note

that we collected Twitter posts up to 30 days before the time period of

interest for each user; thus, the number of posts we analyzed was much

larger than the number of posts listed in Togetter. This data set is related

to the discussion among Twitter users interested in astronomy that

preceded NASA’s press conference about discovery of an arsenic-eating

organism. This data set is related to angry reactions among Japanese

Twitter users against a BBC comedy show that asked “who is the

unluckiest person in the world” (the answer is a Japanese man who got

hit by nuclear bombs in both Hiroshima and Nagasaki but survived).

Page 41: Link Anamoly DEtection
Page 42: Link Anamoly DEtection
Page 43: Link Anamoly DEtection