Professional Documents
Culture Documents
Volume 2 Issue 3
Abstract
In this paper we have sampled data in order to predict the homicidal
Incidences of crime in India. In order to achieve this we have sampled data as
Crime in India, along with Poverty and Unemployment data. After Preliminary
analysis, we process the data and fit various Models – Linear and Polynomial.
Furthermore, we compare these models, and choose a model which explains
the resultant data. Consequently, we highlight that Poverty in Rural and
Urban areas, and Unemployment in Urban Areas is necessary to predict the
Homicidal Incidences in India.
Framework
A. Data-set
Data used is Crime in India 2010, Poverty rural and Urban 2010, Unemployment rural and urban
rates in
TABLE 1
Correlation between poverty and unemployment
Using a polynomial approach and adding from two independent variables, Hence the
square root feature has two advantages. three features can be mapped to a 3d
Firstly, this particular choice gives us a domain. Moreover, the problem lies in the
polynomial curve. Secondly, while various variables which could or could not
comparing intra-feature group-wise, the affect crime – Poverty in Rural, Urban,
square feature gives an emphasis to the combined areas; Unemployment in Rural,
independent variable, while the square root Urban, combined Areas – Even after
feature decreases the importance of that discarding models giving Square root
particular function. feature to Unemployment, we are left with
multiple Inter variable; and Intra variable
E. Experimentation group models – which we will iterate in the
While modeling a 3d plot we are limited by experimental stage to find a model which
two independent variables – The three describes the Crime Incidences with great
features to be used in modeling are derived accuracy and least error.
III. METHODOLOGY m
1 (i ) 2
A. Gradient Descent
J (θ )= ∑
2 m i=1
( (i )
hθ (x )− y )
Independent Variables are still two. The m is Fig: 5 Equation for Cost Fit Function
divided so as to remove any dependence of
length of samples on the output. A squared Error function is used as a
Firstly, the Gradient Descent Algorithm is as Heuristic for the Multivariate Linear &
follows (equation 2) polynomial regression [4].Moreover, the
feature scaling [4] can be done by (equation
repeatuntilconvergence 4)
m
{θ j =θ− α
1
∑
m i= 1
( }
hθ (x (i ))− y (i ))x (ji )
xi − μi
x i=
si
where, w h ere ,
j=columns, μi= Mean
i=rows,
(i ) s i= StandardDeviation
y =observedvalueforsamplei,
hθ (x (i ))=PredictedValue,
Note : B. Multivariate Polynomial Regression
x (0i )= 1 The general Multivariate function used is
Table II
PolynomialRegression : h θ (x )= θ T x
hθ (x )= θ0 √x 1 +θ1 x 2+θ 2 x 22 w h ere ,
hθ (x )ist h e predicted value
W h ere ,
θT is t h e matrix of θ
x 1= First feature x is t h e new sample to be predicted .
x 2= Second feature
To be noted that if the model is trained using
Fig. 6 High end points feature scaled variables, we further scale the
C. Multivariate Linear Regression new sample.
The equation (gradient Descent) is a
Multivariate Algorithm, Therefore to get IV. EXPERIMENT
Linear Model; instead of giving a In this stage we test various cases against
Polynomial function we give it a linear Multivariate Linear and polynomial models
function [6].We know (equation 7): and observe its adjusted r squared and
Predicted r squared values. In the first part,
hθ (x )= θT x= θ 0 x 0 +θ1 x1 +θ 2 x 2 ...+θn x n we test various cases of Poverty against
Unemployment (Inter-feature-wise), and in
The values of Adjusted r squared and the data. Furthermore, Residual Standard
predicted r(Predicted Residual error Sum of Error is the difference between the observed
Squares PRESS Statistic)[8] squared talks value and the estimated or the predicted
about the model to explain variability of value. High values of adjusted r squared and
data around the mean. Greater the value, predicted r squared talks about high
better the model explains the variability in capability to predict the variability in the
data.
TABLE III Comparison between Linear and Polynomial Models Inter Feature
Linear Model Polynomial Model
Adjusted Predicted Residual Adjusted Predicted Residual
Case R- R- Standard R- R- Standard
Squared Squared Error Squared Squared Error
Poverty Rural and 0.8007 0.7541 524.6 0.916 0.8333 340.6
Unemployment Rural
Poverty Rural and 0.905 0.8718 362.2 0.946 0.9228
273.1
Unemployment Urban
Poverty Rural and
0.8271 0.7251 488.6 0.9194 0.8315924 333.6
Unemployment
combined
Poverty Urban and 0.8591 0.7682578 441.2 0.8891 0.8124214 391.4
Unemployment Rural
Poverty Urban and 0.8179 0.7466558 501.4 0.8493 0.7808036 456.1
Unemployment Urban
Poverty Urban and
0.8444 0.7414832 463.6 0.8722 0.8110839 420.1
Unemployment
combined
Poverty Combined
0.8558 0.8020817 446.2 0.9412 0.9093449 285.1
and Unemployment
Rural
Poverty Combined
0.9131 0.8717862 346.4 0.9495 0.9284094 264
and Unemployment
Urban
Poverty Combined
0.8639 0.7718077 433.6 0.9389 0.8890291 290.5
and Unemployment
Combined
Unemployment
0.7137 0.6526985 628.8 0.7624 0.7210551 572.9
Rural and Urban
(Rural Emphasis)
Unemployment
0.7137 0.6526985 628.8 0.6956 0.6010112 648.3
Rural and Urban
(Urban Emphasis)
the models which explains the main reason Income Inequality (Gini Index) could also
why linear model fails. be added to the model which could model
incidences of Crime in other Areas. As the
V. COMPARATIVE STUDY crimes due to being under the poverty line,
As we can see in the Fig 3, the polynomial and Unemployed are addressed by
curve highlighted in blue perfectly models Government policies, our expectation in
the low end sample points, exhibiting a future years the Polynomial model to
linear rise. However, the Linear plane explain crime by lesser extent year by year
Highlighted in green, Overestimates the eventually resulting in correlation between
sample points. In the Fig 4, The Linear poverty and unemployment with Incidences
model highlighted in green overestimates of Crime under 0.1
the sample point of State Uttar-Pradesh.
Moreover, in the mid section it VII. CONCLUSION
Underestimates the points of An Pr(Andra Using Statistical Machine Learning we have
Pradesh) and its neighbors. found out a link between Crime, Poverty and
Unemployment. The Experiment conducted
The Polynomial model, displays a curve on various permutations of the Independent
going closer to the mid section points and Variables found that Poverty in Rural and
Uttar-Pradesh. Overall, the polynomial Urban Areas is most viable method to
model tries to better fit the model than the explain the incidences of homicide in the
linear plane. It also handles outliers year 2010 - Poverty in Urban areas
properly. contributed more to homicides than Poverty
in Rural areas. Another interpretation
VI. FUTURE RESEARCH obtained was that between Poverty and
We are planning a similar research for the Unemployment: The Unemployment in
year 2016 when the data will be available. Urban areas and poverty in both Rural and
Any methods to track poverty are welcomed Urban an area predicts the Incidences of
and can be integrated into our model. Homicide.
Furthermore, our polynomial model is just
one of the models which explain the A limitation of our approach in future
Homicidal incidences; other features such as studies is that the time taken to survey
9 Page 1-11 © MANTECH PUBLICATIONS 2017. All Rights Reserved
Journal of Research in Computer Science and Engineering
Volume 2 Issue 3
[4] Bin Mohamad, Ismail; Dauda [9] Daniel Adler, Duncan Murdoch and
Usman,Standardization and Its others (2016). rgl: 3D Visualization
10 Page 1-11 © MANTECH PUBLICATIONS 2017. All Rights Reserved
Journal of Research in Computer Science and Engineering
Volume 2 Issue 3