Data Analysis and Simulation Modeling
-
Upload
varun-sharma -
Category
Documents
-
view
28 -
download
2
Transcript of Data Analysis and Simulation Modeling
Data Analysis and Simulation Modeling
BY – VARUN SHARMA
Briefing
The first half of this report will deal with simulation modeling, i.e. – To generate data via computer simulation when you don’t have any.
In the second half, I will be talking about Data Analysis and making predictions based on the learning examples.
Some important Terms…
Data Analysis is a process of inspecting, cleaning, transforming, and modeling data with the goal of discovering useful information, suggesting conclusions, and supporting decision-making.
Simulation modeling is the process of creating and analyzing a digital prototype of a physical model to predict its performance in the real world.
Monte Carlo Simulations
Monte Carlo methods (or Monte Carlo experiments) are a broad class of computational algorithms that rely on repeated random sampling to obtain numerical results.
First Model
You are given 6 balls in a rag, three are white and other three are black. You pick three balls with eyes closed, find the probability that all three are of the same color.
def run(): c = [1,1,1,2,2,2] a = [] for i in range(3): a.append(random.choice(c)) c.remove(a[i]) if (sum(a) == 3) or (sum(a) == 6): return True else: return False
Observation:
Running this simulation 500k times, we get the –Output: 0.099574Which is very close to the real value as per the formulas of probability theory, i.e. – 0.01
Modification of the model:
Everything is same but this time, you are given 8 balls in total, 4 of each color.
def run(): c = [1,1,1,1,2,2,2,2] #Declared and Initialized every time the function
is called (In each iteration) a = [] for i in range(3): a.append(random.choice(c)) c.remove(a[i]) #This removes the first instance of a[i] in the list
to simulate no replacing if (sum(a) == 3) or (sum(a) == 6): return True else: return False
Observation:
Running this simulation the same 500k times, we get -Output: 0.143306Which is very close to the real value of 0.14
HIV Virus Simulation
No Drugs Drugs with Change
Observation
In case of No Drugs, the virus propagates without any barrier and grows exponentially.
However, in case of Simulation with Drugs :- Initially, the viruses grow slowly. Picking up resistances on the way. As we
change the drug given to the patient, the population of viruses’ drops significantly.
In the meantime, the average population of resistant to the given drugs starts to rise. After a few lifecycles, the average population of viruses is equal to the average resistant population.
Which means that only those viruses survived who developed a resistance and every virus became resistant in the end.
Machine Learning
Machine learning is a subfield of computer science that evolved from the study of pattern recognition and computational learning theory in artificial intelligence.
In this report, I will be dealing with Regression Analysis using Supervised Machine Learning.
Regression Analysis
Dataset Model
Observation
Theta found by gradient descent: -3.630291, 1.166362 For the city with a population of 35,000, we predict a profit of
4519.767868 For the city with a population of 70,000, we predict a profit of
45342.450129
Multivariate Gradient Descent
Estimating Cost of House: Dataset: [Area (Sq. Feet), Bedrooms] [Price] Normalizing the Features... Running gradient descent for Normalized Dataset... Theta computed from gradient descent: 334302.063993 100087.116006 3673.548451 The prediction for a 3 bedroom house with area of 1650 sq. Feet: $289314.620338
Machine Learning for Indian Railways
With advanced computers and storage techniques available, Indian Railways hold the capability to generate and store data like never before.
The problem arises when this data becomes so enormous that it cannot be analyzed by conventional methods.
But the possibilities remain enormous. CRIS is currently working on models to predict Train Arrival Delays, Possible component breakdowns, and many more.
Future Aspects
Whenever a train comes late, it causes inconvenience to the passengers, delays the schedules and puts a question on the reliability of Indian Railway’s services.
It has been seen that there is always a pattern to every event. The same is the case with Train arrival times. When we analyze weather, seasons, date and time, we see a pattern on how all these constraints affect arrival times.
More than that, we get to know the ‘Hotspots’ of delays in train arrivals. By all this data, we are able to predict the chances of any train getting late (and by how much time) at any particular time when we feed in all these constraints to the system.
This helps us plan ahead in time and be able to provide a better service.
Thank You!