> TVSu9 4ibjbj)dl***:4D np)CO$ "W")WWJJW:J8W b}!6<:0 ##JJJJDetecting Hot Spots Using Cluster Analysis and GIS
Tony H. Grubesic
Center for Urban and Regional Analysis and Department of Geography
The Ohio State University
1036 Derby Hall
154 North Oval Mall
Columbus, OH 43210
Email: HYPERLINK "mailto:grubesic.1@osu.edu" grubesic.1@osu.edu
Alan T. Murray
Department of Geography
The Ohio State University
1036 Derby Hall
154 North Oval Mall
Columbus, OH 43210
Email: HYPERLINK "mailto:murray.308@osu.edu" murray.308@osu.edu
Detecting Hot spots Using Cluster Analysis and GIS
Abstract
One of the more popular approaches for the detection of crime hot spots is cluster analysis. Implemented in a wide variety of software packages, including CrimeStat, SPSS, SAS, and SPLUS, cluster analysis can be an effective method for determining areas exhibiting elevated concentrations of crime. However, it remains a particularly challenging task to detect hot spots using clustering techniques because of the uncertainty associated with the appropriate number of clusters to generate as well as establishing the significance of individual clusters identified. This paper highlights the potential utility of several diagnostics for resolving such issues.
Introduction
Crime mapping and analysis have evolved significantly over the past 30 years. In the beginning, many agencies utilized city and precinct maps with colored pins to visualize individual crime events and crime plagued areas. Today, with the rapid advancement of technology, computerbased techniques for exploring, visualizing, and explaining the occurrences of criminal activity have been essential. One of the more influential tools facilitating exploration of the spatial distribution of crime has been GIS (Ratcliffe and McCullagh, 1999; Harries, 1999). As Murray et al. (2001) note, it is the ability to combine spatial information with other data that makes GIS so valuable. Furthermore, the sheer quantity of information available to most analysts necessitates an intelligent computational system, able to integrate a wide variety of data and facilitate the identification of patterns with minimal effort.
Fundamental to the explanation of criminal activities in a spatial context are certain environmental factors, such as the physical layout of an area, proximity to various services, and land use mixes  all of which are likely to influence criminal behavior (Greenburg and Rohe, 1984). Issues of access, exposure, opportunity, and the availability of targets are also important elements in helping explain crime from an environmental perspective (Cohen and Felson, 1979; Brantingham and Brantingham, 1981). Not surprisingly, research indicates that certain areas are more prone to higher concentrations of crime. Widely labeled as hot spots, such areas are often targets of increased manpower from law enforcement agencies in an effort to reduce crime. Where resources are concerned, the identification of hot spots is helpful because most police departments are understaffed. As such, the ability to prioritize intervention through a geographic lens is appealing (Levine, 1999a).
Operationally, the delineation of hot spot boundaries is somewhat arbitrary. As Levine (1999a) notes, crime density is measured over a continuous area. Therefore, the boundaries separating hot spots of crime from areas without enough activity to merit the label hot spot are perceptual constructs. Moreover, depending on the scale of geographic analysis, a hot spot can mean very different things (Harries, 1999).
Recent studies by the Crime Mapping Research Center at the National Institute of Justice categorize hot spot detection and analysis methods. These techniques have been classified as follows (Jefferis, 1999; Harries, 1999): visual interpretation, choropleth mapping, grid cell analysis, spatial autocorrelation, and cluster analysis. Further, twelve different variations on the five classes of hot spot identification techniques were systematically documented and evaluated, yielding several important results. Although there are a variety of methods for detecting hot spots in crime event data, no single approach was found to be superior to others.
What does become clear in previous work on hot spot detection is that combining cartographic visualization of crime events with statistical tools provides valuable insight for detecting areas of concern. Results of the CMRC (1998) study suggest that a good approach for detecting hot spots are tests of spatial autocorrelation. Implemented in a variety of packages, including CrimeStat 1.1, SpaceStat, and Splus Spatial Statistics, and SAGE, both global and local tests of spatial autocorrelation assist in crime analysis. As demonstrated by Szakas (1998) the implementation of the GetisOrg statistic (Gi statistic) in SpaceStat provided very good measures of crime hot spots for Baltimore County. The utility of spatial autocorrelation and the Gi statistic for hot spot analysis is further supported in the work of Craglia et al. (2000).
Considering the success of statistically grounded tests for hot spot detection, such as the Gi statistic for spatial autocorrelation, it is unfortunate that other wellestablished statistical tests, such as cluster analysis, are generally viewed to be less useful (Chainey and Cameron, 2000). Gordon (1999) suggests that cluster analysis is one of the most useful methods for exploratory data analysis, especially in large multivariate data sets. If this is the case, why has it failed to help crime analysts in hot spot detection?
Statistical approaches for cluster analysis are widely available in a number of software packages, including CrimeStat, SAS, SPSS, Systat, and SPlus. However, the evaluation, documentation, and implementation of cluster analysis algorithms, particularly nonhierarchical versions commonly used in crime analysis such as kmeans, are not clear nor do they give direction for useful application (Murray and EstivillCastro, 1998; Murray and Grubesic, 2002). The gap between what has been developed and what is actually needed for hot spot detection exists for several reasons. First, crime hot spots are spatial phenomena. Therefore, in order to identify elevated concentrations of crime in a geographic area, tools that treat space appropriately are critical. Second, existing approaches for cluster analysis are not necessarily ideal when applied to spatially referenced data. This is best reflected by the relatively poor performance of the kmeans algorithm, as implemented in leading statistical packages, for spatial data analysis (Murray and Grubesic, 2002). Given that nonhierarchical clustering approaches have proven fruitful in other research areas, it is premature to deem such techniques too complex or poorly performing for crime analysis as done by Chainey and Cameron (2000). The failure to date of nonhierarchical techniques is a product of how they are being used and supported.
The purpose of this paper is to explore two of the problematic aspects of cluster analysis for hot spot detection. First, we examine the difficulties in determining the appropriate number of clusters, p, to generate. Second, we highlight several statistical methods that have the potential for establishing the significance of clusters identified as hot spots.
The remainder of this paper is organized as follows. Section 2 outlines the differences between hierarchical and partitioning techniques for cluster analysis. Section 3 examines the problems associated with identifying the appropriate number of clusters. Included is a discussion of several approaches that have the potential to make the identification of the number of clusters more statistically grounded. Section 4 explores the issues of attaching significance to the clusters generated in hot spot detection. Section 5 contains a brief discussion and closing remarks.
2. Clustering Approaches
2.1 Hierarchical
Broadly defined, cluster analysis is a method of classification that places objects in groups based on the characteristics they possess. Bailey and Gatrell (1995) note that all clustering techniques begin in the same fashion. Namely, each method begins with the calculation of a (n x n) matrix, D, of dissimilarities between every pair of observations. In most cases, a Euclidean metric is used as the measure of dissimilarity. Based on (D), cluster analysis breaks observations into groups, linking the most similar observations together in clusters. For example, hierarchical clustering techniques begin with all observations in separate groups and proceed to join the most similar observations (or groups of observations) according to some prespecified criteria (D). In hierarchical clustering, nearest neighbor distance is frequently used as the dissimilarity measure (Bailey and Gatrell, 1995). The nearest neighbor measure is a comparison of the distances between two points (or groups of points) with the average distance between all points. If the distance meets the a priori criterion (usually the calculated probabilities of a threshold distance between observations occurring by chance), observations are linked to form a new cluster. This process is repeated until all points have been assigned to a firstorder cluster. Firstorder clusters are then tested for secondorder clustering in the same manner. Levine (1999a) notes that this process is repeated until all subclusters have converged into a single cluster or the threshold distance criterion fail.
Although hierarchical clustering allows analysts to examine the concentration of crime events in smaller geographical areas and the links between crime cluster hierarchies (e.g. firstorder to secondorder), the problems associated with hierarchical clustering techniques can outweigh the benefits. As Bailey and Gatrell (1995, 233) note, although hierarchical clustering optimizes a criterion at each step, there is no guarantee that, if one ends up with p groups, this is the partition of the observations which would optimize this same criterion over all possible partitions of the observations into p groups. In other words, hierarchical clustering procedures frequently generate local rather than global optima. There are also problems associated with the threshold distances used in hierarchical clustering. Levine (1999a) suggests that crime distributions with many incidents (burglary) typically have lower threshold distances than distributions with fewer incidents (murder). As such, hierarchical clustering does not treat space appropriately, producing inconsistent hot spots. Finally, the implementation of a minimum number of observations rule for clusters is arbitrary at best. How many points constitute a meaningful cluster? Ten? Twenty? This effectively eliminates any objectivity in an analysis as the definition of cluster size is likely to vary between users.
2.2 Partitioning
Partitioning or optimization clustering techniques do attempt to split observations into a prespecified number of groups, p, where the specified criterion is optimized globally over all possible splits. Clearly, the disadvantage to this approach is the requirement that the number of groups must be specified a priori.
One of the most prominent statistical techniques for cluster analysis has been the kmeans approach proposed in Fisher (1958). This technique is based upon multivariate analysis of variance in the evaluation of homogeneity among entities (EstivillCastro and Murray, 2000a). Specifically, the scatter matrix of similarity between entities may be evaluated by its trace (Aldenderfer and Blashfield, 1984). Homogeneity is then measured for a grouping of entities using the sum of squares loss function (Rousseeuw and Leroy, 1987).
Other nonhierarchical clustering approaches have also been developed and utilized. Some alternatives are detailed in Kaufman and Rousseeuw (1990). In the context of spatial application, a review of approaches is given in Murray and EstivillCastro (1998). More specific to the analysis of crime, a discussion may be found in Murray et al. (2001). If we are intent on identifying areas or entities that are strongly related in some predefined sense, then many nonhierarchical clustering techniques may potentially be useful. This is important and significant because a user could select alternative clustering approaches to conduct analysis. However, it is also possible that a user selects an inappropriate approach because they are not aware of associated biases and inaccuracies. That is, all clustering approaches are not equivalent. Recent geographical research has focused on appropriateness issues in the use and application of nonhierarchical clustering
Figure 1: Performance of Commercial Statistical Packages (kmeans)
Source: Murray and Grubesic (2002)
techniques (Murray 1999, 2000a; Murray and Grubesic, 2002). What has been found is that substantial variation exists in the structure and quality of identified clusters. For example, Murray and Grubesic (2002) found that the kmeans solutions identified in several leading statistical packages such as SPSS, SAS, and SPlus deviate by as much as 30% from the bestknown solutions (Figure 1).
3. How Many Clusters?
The ability to identify the appropriate number of clusters for a given set of crime events is one of the most fundamental shortcomings of nonhierarchical techniques for hot spot detection. Levine (1999a) suggests that both the strength and weakness of the kmeans procedure is the ability for the user to define the number of clusters to be generated for a given set of observations. Although most software packages allow the user to specify k groups, it is certainly not a strength in practice. While local knowledge and experience can play a role in hot spot analysis, user defined parameters such as k groups builds significant subjectivity into analysis. Furthermore, implicit to most discussions of the kmeans approach in crime analysis is the notion that there are no established methods for determining the optimal number of clusters (Levine, 1999a,b). In fact, there are numerous methods outlined in the statistics literature detailing potential methods for detecting the appropriate number of clusters (Gordon 1996; Podani, 1996; Lozano et al., 1996; Milligan and Cooper, 1985). As an example, Milligan and Cooper (1985) assessed the ability of thirty different stopping rules to predict the correct number of clusters in randomly generated data sets. Although some of these rules performed poorly, others performed quite well. More importantly, Milligan and Cooper (1985) suggest that stopping methods developed for hierarchical cluster analysis are easily modified for optimizationbased approaches.
One of the more effective procedures for determining the number of clusters in a data set is the cubic clustering criterion (CCC). CCC is the test statistic provided by the SAS package. Developed by Sarle (1983), inflection points in the CCC column of SAS output should be analyzed from n groups to 1 group. These inflection points are indicative of appropriate cluster groupings for the data. Moreover, there may be more than a single inflection point. Alternatively, graphical plots of CCC values may be utilized for analysis. Peaks greater than 2 or 3 on such plots suggest good clustering. Peaks between 0 and 2 suggest potential clusters, but must be interpreted cautiously (Sarle, 1983). The CCC values can also be used in conjunction with pseudo F (PSF) and t2 statistics in SAS. Both measures provide additional information, with large PSF values suggesting a good stopping point. Inflections in the t2 statistic also suggest possible cluster stops.
A second approach Milligan and Cooper (1985) determined to be effective was the Calinski and Harabasz (1974) index. The index is computed as follows:
[trace B/k 1)]/[trace W/n k)
Trace refers to the sum of squared Euclidean distances between entities and their centroids. Further,
n and k are the total number of entities and number of clusters in the solution, respectively. The B and W terms are the between and pooled within clusters sum of squares and cross products matrices. Better performing than the CCC, the Calinski and Harabasz index performed consistently for varying cluster group simulations (Milligan and Cooper, 1985).
Regardless of the stopping technique selected for cluster detection, varying spatial scales of analysis must also be considered. For example, clusters that exist at the neighborhood level might be insignificant at the city level. Thus, Gordon (1998) indicates that stopping rules for cluster analysis have both a global and local component. Global rules are based on the complete data set, typically seeking an optimal index value that compares both withingroup and betweengroup variation (Gordon, 1998). Local rules generally consider whether or not a single cluster should be subdivided into two sub clusters (Gordon, 1998). On the surface, global and local perspectives for cluster detection appear to resemble global/local aspects of spatial autocorrelation. Clearly, more research and experimentation is needed to determine potential links, if they exist.
4. Cluster Significance
An essential need in the use of nonhierarchical approaches for the detection of hot spots is to better understand what makes certain clusters more significant in the context of hot spot analysis. In other words, what makes a hot spot hot and what makes other clusters cold. A body of research does exist for approaching this issue, though its use in the context of hot spot detection is uncultivated. In assessing partitions, Gordon (1999) suggests one must address several issues. First, does a specified partition (k) provide compact and isolated clusters? This is closely linked to research in determining the appropriate number of clusters. The implementation of both global and local stopping rules, as outlined in the previous section will undoubtedly provide additional insight into this process. Second, is it possible to address and validate the internal structure of clusters? The work of Arnold (1979) and Milligan and Mahajan (1980) suggests that Monte Carlo tests of partition validity and significance are potentially useful.
In addition to statistical tests of cluster significance, geometric properties of cluster partitions may also prove useful. Perhaps the most basic indicator of cluster significance is the number of entities in a partition. At face value, groups with a relatively larger number of events are certainly indicative of more criminal activity. However, this does not include any measure of spatial dispersion relative to other clusters. Thus, the generation of a minimumbounding polygon (convex hull) can provide additional insight into the spatial extent of identified clusters (Figure 2). Basic calculations of the area covered by the hull give some indication to cluster compactness. As illustrated in Figure 2, cluster partition 3 is associated with the most compact hull. It is also possible to make basic calculations that consider the number of entities in a partition as they relate to convex hull size. In this case, partition 3 (12 members) has the most compact member/area ratio (.3904). A final indicator of cluster compactness and potential significance is the average distance between cluster members and the cluster center. As displayed in Figure 2, cluster partition 3 (12 total entities) has the smallest average distance (278.87) between its members and its cluster center. This suggests a more compact cluster group.
Figure 2: Geometric Properties for Determining Cluster Significance
It is important to note that all of the tests previously outlined must be implemented across a series of cluster partitions, 1, 2, p, to better evaluate the spatial characteristics of the crime events or study area. More importantly, although these techniques are promising starting points for future research, additional work is needed to make more robust geometric and statistical measures for determining cluster significance.
5. Conclusion
This paper has outlined several problematic aspects of optimization based cluster analysis for crime hot spot detection. Rather than simply dismissing cluster analysis as being too complex for hot spot detection, additional research effort should be directed toward adapting existing statistical and geometric techniques make cluster detection more intuitive and useful for crime analysts.
References
Aldenderfer, M. and R. Blashfield. 1984. Cluster Analysis. Beverly Hills: Sage Publications.
Arnold, F. J. 1979. A test for clusters. Journal of Marketing Research. 16: 545551.
Bailey, TC., and AC Gatrell. 1995. Interactive Spatial Data Analysis. London: Longman Scientific and Technical.
Brantingham, P. and P. Brantingham. 1981. Environmental Criminology. Beverly Hills: Sage.
Calinski, RB., J. Harabasz. 1974. A dendrite method for cluster analysis. Communications in Statistics, 3. pp 137.
Chainey, S. and J. Cameron. 2000. Understanding Hot Spots. Presentation prepared for 2000 CMRC Conference: Wheredunit? Investigating the Role of Place in Crime and Criminality. San Diego, CA.
CMRC [Crime Mapping Research Center]. 1998. Crime Mapping Research Center Hot Spot Project.
URL: HYPERLINK "http://www.ojp.usdoj.gov/cmrc/whatsnew/hotspot/toc.html" http://www.ojp.usdoj.gov/cmrc/whatsnew/hotspot/toc.html
Cohen, L. and M. Felson. 1979. "Social Change and Crime Rate Trends: A Routine Activity Approach". American Sociological Review. 44: 588608.
Craglia, M., R. Haining, and P. Wiles. 2000. A comparative evaluation of approaches to urban crime pattern analysis. Urban Studies. 37(4): 711729.
Fisher, W. 1958. On grouping for maximum homogeneity. Journal of the American Statistical Association. 53: 789798.
Gordon, A.D. 1998. How many clusters? An investigation of five procedures for detecting nested cluster structure. In, Data Science, Classification, and Related Methods, edited by C. Hayashi, N. Ohsumi, K. Yajima, Y. Tanaka, H. Bock, and Y. Baba. Tokyo: SpringerVerlag.
_____. 1999. Classification. New York: Chapman and Hall/CRC.
Greenburg, S. and W. Rohe. 1984. Neighborhood Design and Crime. Journal
of the American Planning Association. 50: 4861.
Harries, K. 1999. Mapping Crime: Principle and Practice. Washington DC: National Institute of Justice (NCJ 178919).
Jefferis, E. 1998. A MultiMethod Exploration of Crime Hot Spots. Paper presented at 1998 Academy of Criminal Justice Sciences (ACJS) Annual Conference. URL: HYPERLINK "http://www.ojp.usdoj.gov/cmrc/whatsnew/hotspot/intro.pdf" www.ojp.usdoj.gov/cmrc/whatsnew/hotspot/intro.pdf
Levine, N. 1999a. CrimeStat: A Spatial Statistics Program for the Analysis of Crime Incident Locations, version 1.1. Washington DC: Ned Levine & Associates / National Institute of Justice.
_____. 1999b. Hot spot analysis using both the Systat kmeans routine and a risk assessment. URL: HYPERLINK "http://www.ojp.usdoj.gov/cmrc/whatsnew/hotspot/kmeans.pdf" http://www.ojp.usdoj.gov/cmrc/whatsnew/hotspot/kmeans.pdf
Lozano, J.A., P. Larranaga, and M. Grana. 1996. Partitional cluster analysis with genetic algorithms: Searching for the number of clusters. In, Data Science, Classification, and Related Methods, edited by C. Hayashi, N. Ohsumi, K. Yajima, Y. Tanaka, H. Bock, and Y. Baba. Tokyo: SpringerVerlag.
Milligan, G.W. and M.C. Cooper. 1985. An examination of procedures for determining the number of clusters in a data set. Psychometrika. 50(2): 159179.
Milligan, G.W. and V. Mahajan. 1980. A note on procedures for testing the quality of a clustering of a set of objects. Decision Sciences. 11: 669677.
Murray, A.T. 1999. Spatial analysis using clustering methods: evaluating the use of central point and median approaches. Journal of Geographical Systems. 1: 367383.
_____. 2000a. Spatial characteristics and comparisons of interaction and median clustering models. Geographical Analysis. 32: 119.
Murray, A.T. and T.H. Grubesic. 2002. "Identifying Nonhierarchical Clusters." International Journal of Industrial Engineering. To appear.
Murray, A.T., I. McGuffog, J.S. Western, and P. Mullins. 2001. Exploratory spatial data analysis techniques for examining urban crime. British Journal of Criminology. 41: 309329.
Podani, J. 1996. Explanatory variables in classifications and the detection of the optimum number of clusters. In, Data Science, Classification, and Related Methods, edited by C. Hayashi, N. Ohsumi, K. Yajima, Y. Tanaka, H. Bock, and Y. Baba. Tokyo: SpringerVerlag.
Ratcliffe, J.H. and M.J. McCullagh. 1999. Hotbeds of crime and the search for spatial accuracy. Journal of Geographical Systems. 1: 385398.
Rousseeuw, P. and A. Leroy. 1987. Robust Regression and Outlier Detection. New York: John Wiley.
Sarle, WS. 1983. Cubic Clustering Criterion. SAS Technical Report A108. Cary, NC: SAS Institute Inc.
Szakas, J. 1998. A MultiMethod Exploration of Crime Hot Spots Software Evaluation: SpaceStat. URL: HYPERLINK "http://www.ojp.usdoj.gov/cmrc/whatsnew/hotspot/spacestat.pdf" http://www.ojp.usdoj.gov/cmrc/whatsnew/hotspot/spacestat.pdf
Alternatively, one can begin with all observations in a single group and break them up into separate clusters.
Clusters can also be subject to a criterion that defines the minimum number of points to be included in a group.
EMBED Excel.Chart.8 \s
3
vD 6<@tuvw1234Ybdgimou{6OJQJ^J6CJH*OJQJ6CJOJQJCJOJQJ^J5CJOJQJ\jCJOJQJU0JCJOJQJjCJOJQJUjCJOJQJUCJOJQJOJQJ;34FG!"1Ics)hi3i.78uvST
ij ####7#8#I#J#})~)....//
$a$.5 6 ##6#8#I#d$e$h$i$s$t$%%N&O&Q&R&''((H+I+++.,3,@,F,./8/:///<0D000[2\25$5(6:6j6k6߾߾߾߱߱ߩߚߩߒ6OJQJ]OJQJ5OJQJ\jOJQJUmHnHu6CJOJQJj0JCJOJQJU56CJOJQJ\]6CJOJQJ]5OJQJ\^JCJOJQJ6OJQJ]^JOJQJ^J6OJQJ^J4/H0I0Z2[2'6(6s6668.8/8F8G8:>;>BBBBBB+CDDDGG$a$
k6677.8/8E8:::!:::;;>>\?]?4A5AAABACAAAABBBBBBBBBBB*C+C,C1C2CCCCCDDGH!J"J#J$J3LuQvQwQ´´´´´ª jUOJQJ^J6]<CJOJQJaJ6B*CJOJQJ]phB*CJOJQJphCJH*OJQJ6CJOJQJ5CJOJQJ\CJ6CJOJQJ]CJOJQJOJQJ:GHH2L3LtQuQwQQQmSnS}S~SUU U
UUUUUzU{UUUNVOV$a$
$a$wQQQ@RARnSSUUUCUSUUUUV{VVVWWWWWX\X]XXXXXXCYDY`YYYGZvZ[2[[[%\R\w\\8]z]]]]]]]]]^jCJOJQJU5CJOJQJ0JCJOJQJj CJOJQJUjCJOJQJUCJOJQJ^J5\5CJOJQJ\6CJOJQJ]CJOJQJOJQJ5OJQJ\8OVVV)W*WWWWXXXoYpYZ
ZZZ[[[[.\b\c\\\]]^^
*$a$^h^*_+_r_s_t___E`v`_alaa
bbb7cLccccczdd!eRe"fAfuff&g'gggggghhhh)h*hhhhiiiiiпдЦЙЎ jU
j0JUj0JOJQJU<aJ0JCJOJQJj
CJOJQJUjCJOJQJUOJQJ^J6CJOJQJ]CJOJQJ0JOJQJjQOJQJUjOJQJUOJQJ6OJQJ]3^^__``aab bbb[c\cccddeePfQfff&g'ghh$a$d,
*^hhhh h!h"h#h$h%h&h'h(h)hhii1i2i3i4iii.i/i0i3i4iCJOJQJ jU jUj)?
OJQJUV 1h/ =!"#$%`!Lkױ⑱}^C@9b'x\l~wɏ@TjQ5 IC6*rrSHBp(Ԇ7
j%TR ОbADǁDpeھy3vno߭w rmxx[15ḮQE*'Nڷ)\7G>2(_eQٖl62*~"
dܦ7oFǉZp8g~o
*yxxmgA1*PEtyF=Xe4^`4{:IuVj*wNqQw1GW?]xO:Ert:_Xgkc8K~1
8]cIWp1
?Nq:d(cG{wv60ecƞalc+c1o̘Mq0w
^dEk`MdlckgÌf>eSza>Jxfpb!eQ16Eblc+u0f$iF)oN45W^ gDY7zZ#(bB#Y{f+DrnX+c&uC+jXib)ا1S9GLALBLnhb!L$aLrR%*
hbJX1S9¯rfBŇpNiV8
wYV,+>
_a)V8
X+
wYዬp'+
dOXKfpZvq+~~gMqDYE}[!! 'BDT#OVyf+1?vWaU~l"M9~a=*6='X\aOUͮP3ctM݉گD=v'jQHyo슆."Hެ*'(k658ukS_ߔhۘϐ%Pm+Ξf7
_{zI؎Jӈr<z1:#s;_;xZݓV > ydq^
،soᱷg.,$F9.zɶ{QeӫWy=gGAbq"#%).Ir\e9]Lm`lKK}&I݇~CFPy+^ݮjw@DuZi)2 4W:&w}X1_
[i'vΟ9UYv{Wbނ3bWDSvds֎ϒ.WWs85u;V;7]j2T6}rTqI89o><渰vu֎i9[;Uǂ!gopnz?t~33?'ڔ{1{Kgu쳙؊[1c+&hŘE6=HV0iEuWDKG4p~rv8ɉ*YpCnaϳߥS\4ϟY
10"26NLs"W[`!6NLs"W[ 0zvx͗hUǟ9w?L3j3rw
SjV[yjjT&KJV #!_DT0H?4Ժ=<ϻi˝c{{=2O
B)@&Zdhj
;OߒK_K"%\NfqC9k.\ 1v&<),c8އ}GV.+M TzoYXZ`p}M/]Z@TKttabxCtVR@aY^_ .
~a+#,*M70jVW:S[:[[7tuP4OVUXF?.HQ
MC?ckgk3^f$˱Qe{؈ FܲEssxJO(M4Q=F!v0ǆ(jUfcκs+mSMt8>!#YDǙm~x]3Qspާzh)#wϛ_tO=eyMp& 4_OE"/K!O<5snf'vdg=?9L6envԌ]n4߸X8'rJ5mێKmL2}f餥L}RLg,u2g\_aY=eD$lD$2B^7BQ! oQ! =FTH16cO"Q!o?BHTHP
j"Q!AOgITHfK3uXS?@ABCDEFGHIJLMNOPQRUtYZ[\]^_`abcdefghijklmnopqrswxyz{}~Root Entry F !6W@Data
KWordDocument
ObjectPool` d!6 !6_1068706287!F` d!6` d!6Ole
PRINTX(7CompObjb
!"#$%&'()*,
!FMicrosoft Excel ChartBiff8Excel.Chart.89qZOh+'0HPd
Alan MurrayTony GrubesicxMicrosoft Excel@rS@3˿@[yz)
"
'' Times New RomanTimes New RomanTimes New RomanTimes New Roman"System' "
~ !! "' 0 "
$Jk " ' "
ttUU55GG^^44vvLL""' } !!' q' q0 " u
}}wSwS))ffLjLjT@T@aaddhWhW " }}SS))jj@@aa@@fWfWj " }}wSwS''t)t)=j=j@@22WWu "$`'uu`'  q}
$}}h}'  q
$'  qwS
$SbhwS>wSb'  q
$'  q)
$)>))'  qf
$Qf{fQ'  q
$'  qLj
$j7LjaULj7'  q
$'  qT@
$@?UT@i+T@?'  q
$'  qa
$L+avaL'  qd
$OdyldO'  q
$'  qhW
$WSlhW}BhWS'  q
$'  q "
'q'
q}
h'
q
'
qS
h>'
q
'
q)
>'
q
'
q
'
qj
{U'
q
q'
q@
Up+'
qa
vL'
q@
U++'
q
l'
q
'
qfW
{lQB'
qj
U'
q "
$q'q' q}
$}h}' q
$' qwS
$Sbh>Sb' q'
$<<' qt)
$)_>)_' q
$~~' q
$' q=j
$j(RURj(' q
$' q@
$@U + @' q
$pp' q
$+' q
$l' q2
$GG' qW
$WlBW' q
$))' q' q' ' '
2
0%
2
5% 2
\y10% 2
y15% 2
=y20% 2
y25% 2
y30% 2
y35% 2
y40% 2
ny45% ' '
2
4 2
t5 2
6 2
J7 2
8 2
9
2
10
2
11
2
W12
2
13
2
14
2
15
2
16
2
n17
2
18
2
D19
2
20' ' F'
"2
Number of clusters!$' ' t!
Times New Roman
2
+2Objective value deviation from best known solution
'  "

Jk'
Il'
Il "
+ $+@++
2
SPSS&'&&'  Il'  Il "
+"@2
SPlus&'"'
Il'
Il "
y
yy+$+d@+d2
TSAS&1&' Il' ' "
0
~ !!
' ' 'ObjInfo
WorkbookvOlePres000 DSummaryInformation( A@\p
Tony Grubesic Ba=H5
=hK9X@"1Arial1Arial1Arial1Arial1OArial1'Arial1"Arial1"Arial1.xTimes New Roman1.dTimes New Roman1.Times New Roman1.@Times New Roman1.Times New Roman1.Times New Roman1.Times New Roman1.@Times New Roman1Arial1.nTimes New Roman1Arial1.Times New Roman1._Times New Roman"$"#,##0_);\("$"#,##0\)!"$"#,##0_);[Red]\("$"#,##0\)""$"#,##0.00_);\("$"#,##0.00\)'""$"#,##0.00_);[Red]\("$"#,##0.00\)7*2_("$"* #,##0_);_("$"* \(#,##0\);_("$"* ""_);_(@_).))_(* #,##0_);_(* \(#,##0\);_(* ""_);_(@_)?,:_("$"* #,##0.00_);_("$"* \(#,##0.00\);_("$"* ""??_);_(@_)6+1_(* #,##0.00_);_(* \(#,##0.00\);_(* ""??_);_(@_)"Yes";"Yes";"No""True";"True";"False""On";"On";"Off" + ) , * `@@ / @ @ "p@ / `@@ `@ ` @ `@ ` `  `@  `  `  ` `@@/ "p@/ `@ `@ ` @/ `@/ `) ` ` * `@ ` ) ` * `@ / "p @ *
d
$
$
`JChart3!
objdifferenceZ@kmeansUakron114`iz@
3 @@
RC,KMBON10KMN10CPKMST%DIFF(KMSTN10KM)KMBO = KMEANS Best out.KMST = Initial KMEANS solution provided by SASCPCP(MW)SPSS(km) SPLUS(km)SAS(km)=CPCP(MW) = Optimal Solution evaluated by mutiWeber heuristic
%DIFF SPSS%DIFF SPLUS %DIFF SASSecondsCPCP(kmeans)%CPCP(kmeans) = KMBO evaluated as CPCP8N10KM = Optimal Solution of kmeans using 10,000 restartsN10CP = N10KM evaluated as CPCPp=4p=3p=5p=6p=7p=8p=9p=10p=11p=12p=13p=14p=15minmaxmeanp=20p=16p=17p=18p=19MinMaxMean2M
%<
A@MBrother HL1450 series0XXS3
Brother HL1450 seriesBRHL1450.ini(Ariald"dXX??3`
j`
1`
j`
>+[@`
1п)?3d23 M
NM43Q
SPSSQ
;Q;Q3_4E43Q
SPlusQ
;Q;Q3_4E43Q
SASQ
; Q;Q3_
NM
d4E4D$% M 3O&Q4$% M 3O&Q4FAF3O1s
3 b#M&43*N #M&4% OdM3OT&Q
(Number of clusters'4% !
MZ3O&Q
h2Objective value deviation from best known solution'4523
NM43"
R,j3OV+% M,3O&Q44444e@@@@@@@@@@@@ @ @ @"@"@"@$@$@$@&@&@&@(@(@(@ *@ *@ *@
,@
,@
,@.@.@.@0@0@0@
1@
1@
1@2@2@2@3@3@3@4@4@4@eo9S?~`;s?`M
?<$>eN?D;N?SqS >UP??!B?'Bkm>Q)b?_?BQb?}
8?Sߺ?ʢk?Q4?uCa
?9s::?SANS?Bs?qwؙU?Q ? >"? K\s? $ܫ?
8
y
A@
dMbP?_*+%"??U3
6JN@0?
6M.C?
6?
6C]?
6;?
6'й8S?
6g}7?
6ma9G?
6 =?
6"De?
6],?
6.q
?
6?
61}?
6?ܑͯ?
6'}/#J?
6~?
6?
6ɲ?
6?
6o?
6C?
6vb6C?
6RFkF?
$6x1?
$6i1y?
$6_nHS?
$6I[?
6#"7Hhg2yL"V9RIm*RL3)d~_MtE/ϡUJ
^Y
2oOe+,i+r5"Ucq̂5SZ(C1ŰsOlh9F(\9B!rǨ&_F#rgSs=fl
NV'`bv5WFf0rw?*9!yM
4ĭޜ,iRsZr\t4O^m>}EExU!_h5~׆]P]6Ye܇}[V\g%_4~i$hiI04^6}(0tt~XHǵ =meeE0w`
QY!Ovi&+ea9s
4J "
>oS1wi_
]a.Ia`gi
ƈ֠z
Zd~k(
lqI:p)ʎBv}K֠[>2ŧ#^8Z81Y) )de1pc.U ^g`
Bt<2w.
2xU#$UR63A.JE&W?6sf@7
A@
dMbP?_*+%"??nU}I}}m}}
}
}
1
2
0@==N)U3A+K+P:_?DDDdS:X*@ )
+JN@0?DDD0@ssVG'\EA+\n@DDDd
 )
+M.C?
DDD0@99Ij/A`@2.Ak,A+b/*@DDDd
 )
+?
DDD0@77;)A(A+3cL6As4iA~
; (?cAPA+}.Wx@DDDd
 )
+1}?
DDD~
"@ףpSA
ףRdA~
_GMaAOy8mA+f>hz@DDDd
 )
+?ܑͯ?
DDD~
$@=
oAq=
abA~
0]_AC͂+Wo@DDDd
 )
+'}/#J?
DDD~
&@4Aq=
`A~
q=
+ \A7h_+%J@
DDDd
 )
+~?
DDD~
(@RjAp=J^A~
kbb)\mYA7"c+7]ͭ@DDDd
 )
+?
DDD~
*@t^A(\cA~
3`)PWA\Q+Uiz@DDDd
 )
+ɲ?
DDD~
,@(\SA8U_A~
{$(UAW8++KX%4F@DDDd
 )
+?
DDD~
.@GA\)[Aw{SMG7/4i+聉@DDDd0b>f@ )
+o?
DDD
!""""#
@D ,,,,,>@
7
A@
dMbP?_*+%"??U}}} }
}m}
} $}
%
&
&
&
&
&
&
*
*
)
~
@p=:bAp=HaAQIaA\aA$)3o9S?DDD)3~`;s?DDD) 4`DDD) 4M
? DDD
(vb6C?DDD~
@TAQTA)\vWATAǇ$$)3DDD)3<$>DDD) 4eN? DDD
(RFkF?DDD0'@xkO1jߓk j?U$m{>A)3D;N?DDD)3SqS >DDD) 4UP? DDD
(x1?DDD0' @b7\+dp+\/ s$o)3?!B?DDD)3'Bkm>DDD) 4Q)b? DDD
(i1y?DDD0'"@AVQU`[0Q7j$9A)3_?DDD)3BQb?DDD) 4}
8? DDD
(_nHS?DDD0($@WM[I'jXE8A7A)3Sߺ?DDD)3ʢk?DDD) 4Q4? DDD
(I[?DDD0$&@RF[DAoYJ}<W\$?X)3uCa
? DDD)39s::? DDD) 4SANS? DDD
(#"?D
D
D
)
3K\s?D
D
D
)
4$ܫ? D
D
D
(
겢%? D
D
D
0$,@W+'4q,/>c.IE4A$L)38
%B# 6`"7Hhg2yL"V9RIm*RL3)d~_MtE/ϡUJ
^Y
2oOe+,i+r5"Ucq̂5SZ(C1ŰsOlh9F(\9B!rǨ&_F#rgSs=fl
NV'`bv5WFf0rw?*9!yM
4ĭޜ,iRsZr\t4O^m>}EExU!_h5~׆]P]6Ye܇}[V\g%_4~i$hiI04^6}(0tt~XHǵ =meeE0w`
QY!Ovi&+ea9s
4J "
>oS1wi_
]a.Ia`gi
ƈ֠z
Zd~k(
lqI:p)ʎBv}K֠[>2ŧ#^8Z81Y) )de1pc.U ^g`
Bt<2w.
2xU#$UR63A.JE&W?6sf+[@0?3d4"'23 M
NM43Q
SPSSQ
;Q;Q3_4E43Q
SPlusQ
;Q;Q3_4E43Q
SASQ
; Q;Q3_
NM
d4E4D$% M 3O&Q4$% M 3O&Q4FA=3OW
3 b#M&43*N #M&4% NiM3O&
Q
(Number of clusters'4%
MZ3OW&
Q
h2Objective value deviation from best known solution'4523
NM43"
rF3OqD% M,3O&Q44444e
e
e
>@7
)C !
'' ' "
"' 0 "
$KppKKs:p "KppKK' "K
KSmall FontsIKIKIKIKIKqIqKZIZKDIDKIKIKKpKK\\mm ++<<NN__pp"System'
' tH' tH0 "T
eevv""44EEbVbVg "Teevv""44EEVVg "Teevvii{{[[PP>>{"{"44EE$V$VUgT " $TWTQT' tHe
$ehebe' tHv
$vyvsv' tH
$' tH
$' tH
$' tH
$' tH
$' tH
$' tH
$' tH
$' tH
$' tH"
$"%""' tH4
$47414' tHE
$EHEBE' tHbV
$V_YbVeSbV_' tHg
$gjgdg' tHT "
WQ' tHe
hb' tHv
ys' tH
' tH
' tH
' tH
' tH
' tH
' tH
' tH
' tH
' tH"
%' tH4
71' tHE
HB' tHV
YS' tHg
jd' tHT "
$TWQT' tHe
$ehbe' tHv
$vysv' tH
$' tH
$~~' tH
$' tHi
$fllf' tH{
$x~~x' tH[
$X^^X' tH
$' tHP
$MSSM' tH>
$;AA;' tH{"
$"x%~~"x' tH4
$4714' tHE
$EHBE' tH$V
$V!Y'S'V!' tHUg
$gRjXdXgR' tH' tH' ' '
2
?0%
2
?5%2
;10%2
;15%2
<20%2
n<25%2
W<30%2
A<35%2
*<40%2
<45%' '
2
S4 2
d5 2
u6 2
7 2
8 2
9
2
10
2
11
2
12
2
13
2
14
2
15
2
16
2
117
2
B18
2
S19
2
d20' Small Fonts'
"2
Number of clusters' ' ;!!
A!#(' 
"0 "
s
W]]WSS A!#(' 
"0 "
s
W]]WSS A##k.(' 
"0 "
s
W]]WSS A##k.(' 
"0 "
s
W]]WSS' Times New Roman "

t:p' 
r;q' 
r;q "
Dt
DD $ADG~DA
2
=SPSS'  r;q'  r;q "
Wt
WW ZT~ 2
PSPlus' 
r;q' 
r;q "
jt
jj $gm~mg 2
cSAS'  r;q'  '  "
0

' ' 'DocumentSummaryInformation841Table#SummaryInformation(
DocumentSummaryInformation8Z՜.+,0 PXdlt
OSU
objdifferencekmeans akron114Chart3WorksheetsCharts
Oh+'0( DP
lx
@9b'x\l~wɏ@TjQ5 IC6*rrSHBp(Ԇ7
j%TR ОbADǁDpeھy3vno߭w rmxx[15ḮQE*'Nڷ)\7G>2(_eQٖl62*~"
dܦ7oFǉZp8g~o
*yxxmgA1*PEtyF=Xe4^`4{:IuVj*wNqQw1GW?]xO:Ert:_Xgkc8K~1
8]cIWp1
?Nq:d(cG{wv60ecƞalc+c1o̘Mq0w
^dEk`MdlckgÌf>eSza>Jxfpb!eQ16Eblc+u0f$iF)oN45W^ gDY7zZ#(bB#Y{f+DrnX+c&uC+jXib)ا1S9GLALBLnhb!L$aLrR%*
hbJX1S9¯rfBŇpNiV8
wYV,+>
_a)V8
X+
wYዬp'+
dOXKfpZvq+~~gMqDYE}[!! 'BDT#OVyf+1?vWaU~l"M9~a=*6='X\aOUͮP3ctM݉گD=v'jQHyo슆."Hެ*'(k658ukS_ߔhۘϐ%Pm+Ξf7
_{zI؎Jӈr<z1:#s;_;xZݓV > ydq^
،soᱷg.,$F9.zɶ{QeӫWy=gGAbq"#%).Ir\e9]Lm`lKK}&I݇~CFPy+^ݮjw@DuZi)2 4W:&w}X1_
[i'vΟ9UYv{Wbނ3bWDSvds֎ϒ.WWs85u;V;7]j2T6}rTqI89o><渰vu֎i9[;Uǂ!gopnz?t~33?'ڔ{1{Kgu쳙؊[1c+&hŘE6=HV0iEuWDKG4p~rv8ɉ*YpCnaϳߥS\4ϟY
t6@>t6}R
՜.+,D՜.+,t0hp
Ohio State University2.*ke
3Detecting Hot Spots Using Cluster Analysis and GISTitel 8@_PID_HLINKSA*P=http://www.ojp.usdoj.gov/cmrc/whatsnew/hotspot/spacestat.pdfW:http://www.ojp.usdoj.gov/cmrc/whatsnew/hotspot/kmeans.pdfM 9http://www.ojp.usdoj.gov/cmrc/whatsnew/hotspot/intro.pdf.o8http://www.ojp.usdoj.gov/cmrc/whatsnew/hotspot/toc.html@{mailto:murray.308@osu.edu+mailto:grubesic.1@osu.edu6:uQfigure2.wmf
FMicrosoft Worddocument
MSWordDocWord.Document.89q
i>@> StandaardOJQJ_HmH sH tH (@(Kop 1$@&CJ.@.Kop 2$@&
5CJ\>@>Kop 3$$
*@&a$CJOJQJ:@:Kop 4$@&B*CJOJQJph>>Kop 5$
*@&5CJOJQJ\DA@DStandaardalinealettertype.U@. Hyperlink>*B*ph>Q@>Platte tekst 3$a$CJ^J.@.
Voetnoottekst6&@!6VoetnootmarkeringH*VC@2VPlatte tekst inspringen
1$^CJh6B@B6Platte tekst$a$^JQ"$4eu4e!4e34FG!"1Ics.78uvST
ij78IJ}%~%****++H,I,Z.[.'2(2s2224.4/4F4G4::;:>>>>>>+?@@@CCDD2H3HtMuMwMMMmOnO}O~OQQ Q
QQQQQzQ{QQQNRORRR)S*SSSWTTToUpUV
VVVWWWW.XbXcXXXYYZZ[[\\]]^ ^^^[_\___``aaPbQbbb&c'cddddd d!d"d#d$d%d&d'd(d)ddee1e2e5e0000000000000000000000000000000000000000000000000.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.@0.@0.@0.000000000.0.0.0.0.0.0.0.0.0.0.80.0>0>0>00>0>0>0>00000000>0>0>00000>0>0>0>0>0>0>0>0>0>0>0>0>0>0>0>0>0>0>0>0>0>0>0>0>0>0>0>0>0>0>0>0>0>0>0>0>0>0>0>0>0>@0>0>0>0>0>0>0>0>0>0>0>0>0>0>0>0>0>0>0>0>(00pc0pc0pc0pc0pc0pc0pc0pc0pc0pc0pc0pc0@0@0
0@00k6wQ^i4i5:<>@C/GOV^h4i689;=?AB3i7\TTTYYY*[s[[ccd4eXXXXXX!:?2$6NLs"W[2$$_ď\@6D3zzq 2$Lkױ⑱}^"@>(
fA??. `qRSRSqq"B
S ?[.4e$T@ OLE_LINK1.e5e/e5e_HkH\TT)dee2e5e22\TT)dee2e5e3
Tony GrubesicVC:\WINDOWS\Application Data\Microsoft\Word\AutoRecovery save of hot_spots_11_24_01.asd
Tony GrubesicC:\Tony\NIJ\Conference\hot_spots_12_01_01.doc
Tony GrubesicC:\Tony\NIJ\Conference\hot_spots_12_01_01.doc
Tony GrubesicC:\Tony\NIJ\Conference\hot_spots_12_01_01.doc
Tony GrubesicC:\Tony\NIJ\Conference\hot_spots_12_01_01.doc
Tony GrubesicC:\Tony\NIJ\Conference\hot_spots_12_01_01.doc
Tony GrubesicC:\Tony\NIJ\Conference\hot_spots_12_01_01.doc
Tony GrubesicC:\Tony\NIJ\Conference\hot_spots_12_01_01.doc
Tony Grubesic3C:\Tony\NIJ\Conference\grubesic_murray_hotspots.docManuel J.J. Lopez/C:\Mijn documenten\SVOB\Hot spot clustering.doct
784DnO}OQObdde1e5e@d4eP@UnknownG:Times New Roman5Symbol3&:Arial"1h+e+e
\}R*!0ke2Q2Detecting Hot Spots Using Cluster Analysis and GIS
Tony GrubesicManuel J.J. LopezCompObj+j