|
|
- Spatial process-based transfer learning for prediction problems
-
Free pre-print version: Loading...
Rate this result:
What is this?
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Abstract Although spatial prediction is a versatile tool for urban and environmental monitoring, the predictive accuracy is often unsatisfactory when limited samples are available from the study area. The present study was conducted to improve the accuracy in such cases through transfer learning, which uses larger datasets from external areas. Specifically, we proposed the SpTrans method, which pre-trains map patterns for each area using spatial process models. These patterns are then used in transfer learning to distinguish between unique patterns in the study area and common patterns across areas. The performance of the proposed SpTrans method was examined using land price prediction, with empirical results suggesting that the model achieves higher prediction accuracy than conventional learning, which does not explicitly consider local spatial dependence. PubDate: 2025-01-31 DOI: 10.1007/s10109-024-00455-y
- Spatiotemporal forecasting models with and without a confounded covariate
-
Free pre-print version: Loading...
Rate this result:
What is this?
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Abstract The aim of this paper is to analyze the prediction accuracy of multivariate spatiotemporal forecasting models with a confounded covariate versus univariate models without covariates for discrete (count and binary) and continuous response variables by means of theoretical considerations and Monte Carlo simulation. For the simulation, we propose a Bayesian latent Gaussian Markov random fields framework for three types of generalized additive prediction models: (i) a multivariate model with a spatiotemporally confounded covariate only, denoted in the rest of the paper as the multivariate model; (ii) a univariate model with spatiotemporal random effects and their interaction only; (iii) and a full multivariate model consisting of the combination of (i) and (ii), that is, a univariate model combined with a multivariate model. One simulation result is that for all three kinds of response variables, the univariate and the full multivariate model uniformly dominate the multivariate model in terms of prediction accuracy measured by the mean-squared prediction error (MSPE). A second finding is that for discrete variables the univariate model uniformly dominates the full multivariate model. A third result is that for continuous response variables the full multivariate model dominates the univariate model in the case of low confoundedness of the covariate. For high confoundedness, the reverse holds. The results provide important guidelines for practitioners. PubDate: 2025-01-28 DOI: 10.1007/s10109-024-00454-z
- Correction: A scale-adaptive estimation for mixed geographically and
temporally weighted regression models-
Free pre-print version: Loading...
Rate this result:
What is this?
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
PubDate: 2025-01-27 DOI: 10.1007/s10109-025-00456-5
- A scale-adaptive estimation for mixed geographically and temporally
weighted regression models-
Free pre-print version: Loading...
Rate this result:
What is this?
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Abstract Mixed geographically and temporally weighted regression (GTWR) models, a combination of linear and spatiotemporally varying coefficient models, have been demonstrated as an effective tool for spatiotemporal data analysis under global homogeneity and spatiotemporal heterogeneity. Simultaneously, multiscale estimation for GTWR models has also attracted wide attention due to its scale flexibility. However, most of the existing estimation methods for the mixed GTWR models still have the limitation that either all of regression relationships operate at the same spatiotemporal scale, or each of coefficients is estimated using back-fitting procedures that are very time-consuming. In order to improve the estimation accuracy and alleviate the computation burden, we propose a multiscale method with the adaptive bandwidth (short for scale-adaptive) for calibrating mixed GTWR (say mixed MGTWR) models. In the proposed multiscale estimation approach, a two-step method is used to estimate the constant coefficients and varying coefficients, and then each of the varying coefficients is again estimated by back-fitting procedures with different bandwidth sizes. In addition, we also address the calculation of “hat matrix” in the multiscale estimation for GTWR model and then derive the hat matrix of the complete MGTWR model. Simulation experiments assess the performance of the proposed scale-adaptive calibration method. The results show that the proposed method is much more efficient than existing estimation methods with regard to estimation accuracy and computation efficiency. Moreover, the proposed scale-adaptive method can also correctly reflect the inherent spatiotemporal operating scales of the explanatory variables. Finally, a real-world example demonstrates the applicability of the proposed scale-adaptive method. PubDate: 2025-01-05 DOI: 10.1007/s10109-024-00453-0
- JGS Editors’ Choice article
-
Free pre-print version: Loading...
Rate this result:
What is this?
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
PubDate: 2024-10-01 DOI: 10.1007/s10109-024-00450-3
- Spatial machine learning for predicting physical inactivity prevalence
from socioecological determinants in Chicago, Illinois, USA-
Free pre-print version: Loading...
Rate this result:
What is this?
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Abstract The increase in physical inactivity prevalence in the USA has been associated with neighborhood characteristics. While several studies have found an association between neighborhood and health, the relative importance of each component related to physical inactivity or how this value varies geographically (i.e., across different neighborhoods) remains unexplored. This study ranks the contribution of seven socioecological neighborhood factors to physical inactivity prevalence in Chicago, Illinois, using machine learning models at the census tract level, and evaluates their predictive capabilities. First, we use geographical random forest (GRF), a recently proposed nonlinear machine learning regression method that assesses each predictive factor’s spatial variation and contribution to physical inactivity prevalence. Then, we compare the predictive performance of GRF to geographically weighted artificial neural networks, another recently proposed spatial machine learning algorithm. Our results suggest that poverty is the most important determinant in the Chicago tracts, while on the other hand, green space is the least important determinant in the rise of physical inactivity prevalence. As a result, interventions can be designed and implemented based on specific local circumstances rather than broad concepts that apply to Chicago and other large cities. PubDate: 2024-10-01 DOI: 10.1007/s10109-023-00415-y
- A random forests-based hedonic price model accounting for spatial
autocorrelation-
Free pre-print version: Loading...
Rate this result:
What is this?
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Abstract This paper introduces a spatially explicit random forests-based hedonic price modeling approach to account for spatial autocorrelation in the data. Spatial autocorrelation is a common data structure in georeferenced data, and controlling associations among spatial objects is crucial for accurate statistical analysis. Validations of machine learning and artificial intelligence applications require using out-of-sample data sets to assess models’ fit on the training dataset. Previous research has shown that nonspatial cross-validation methods, commonly used in machine learning applications for spatial data, often provide over-optimistic results. Some recommended the use of spatial cross-validation methods to obtain more reliable estimates. However, the machine learning models used in these previous studies did not include spatially explicit parameters to account for spatial autocorrelation in the data. Unlike machine learning-based models, statistical-based models such as the spatial lag model can effectively account for spatial autocorrelation in the data. This research applied a two-stage least squares random forests framework to construct a hedonic pricing model incorporating a spatial lag for the Miami-Dade single-family residential parcel sales data. Random forests models are evaluated using K-fold, spatial blocking K-fold, and spatial leave-one-out cross-validation methods. The goodness-of-fit of the tested random forests-based models is evaluated using the coefficient of determination and mean square error scores. Additionally, spatial autocorrelations in residuals from random forests models are investigated by conducting Moran’s I test. Our research indicates that failing to account for spatial autocorrelation in data can lead to unreliable and overly optimistic estimates. However, including a spatially lagged variable substantially reduces fluctuations in goodness-of-fit measures across different validation sets. PubDate: 2024-10-01 DOI: 10.1007/s10109-024-00449-w
- What drives urban redevelopment activity' Evidence from machine learning
and econometric analysis in three American cities-
Free pre-print version: Loading...
Rate this result:
What is this?
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Abstract This paper uses the question posed in its title—what drives urban redevelopment activity'—to frame a comparison of machine learning and econometric approaches for modeling parcel change. It starts by arguing that geographical science has an obligation to weigh the tradeoffs of methods as they emerge into the mainstream—especially when they spread like wildfire as, machine learning has. The empirical analysis, which makes up the middle sections of the paper, examines parcel changes in Boston, Chicago, and Seattle between 2010 and 2020. Two machine learning approaches, k nearest neighbors and random forest, are benchmarked against an econometric approach, probit. The models are explained in a way that is intended to be accessible to a broad audience and evaluated using intuitive metrics; throughout, an effort is made to draw a clear link between machine learning and econometric methods. The modes of analysis rest on different knowledge bases, so analysts should take care to ensure to distinguish between the two. The paper closes with a summary and some concluding thoughts. Overall, it suggests that machine learning and econometric approaches extend the reach of other’s capabilities and, therefore, should be viewed as complements, not substitutes. PubDate: 2024-10-01 DOI: 10.1007/s10109-024-00451-2
- A structured comparison of causal machine learning methods to assess
heterogeneous treatment effects in spatial data-
Free pre-print version: Loading...
Rate this result:
What is this?
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Abstract The development of the “causal” forest by Wager and Athey (J Am Stat Assoc 113(523): 1228–1242, 2018) represents a significant advance in the area of explanatory/causal machine learning. However, this approach has not yet been widely applied to geographically referenced data, which present some unique issues: the random split of the test and training sets in the typical causal forest design fractures the spatial fabric of geographic data. To help solve this issue, we use a simulated dataset with known properties for average treatment effects and conditional average treatment effects to compare the performance of CF models across different definitions of the test/train split. We also develop a new “spatial” T-learner that can be implemented using predictive methods like random forest to provide estimates of heterogeneous treatment effects across all units. Our results show that all of the machine learning models outperform traditional ordinary least squares regression at identifying the true average treatment effect, but are not significantly different from one another. We then apply the preferred causal forest model in the context of analysing the treatment effect of the construction of the Valley Metro light rail (tram) system on on-road CO2 emissions per capita at the block group level in Maricopa County, Arizona, and find that the neighbourhoods most likely to benefit from treatment are those with higher pre-treatment proportions of transit and pedestrian commuting and lower proportions of auto commuting. PubDate: 2024-10-01 DOI: 10.1007/s10109-023-00413-0
- Beyond visual inspection: capturing neighborhood dynamics with historical
Google Street View and deep learning-based semantic segmentation-
Free pre-print version: Loading...
Rate this result:
What is this?
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Abstract While street view imagery has accumulated over the years, its use to date has been largely limited to cross-sectional studies. This study explores ways to utilize historical Google Street View (GSV) images for the investigation of neighborhood change. Using data for Santa Ana, California, an experiment is conducted to assess to what extent deep learning-based semantic segmentation, processing historical images much more efficiently than visual inspection, enables one to capture changes in the built environment. More specifically, semantic segmentation results are compared for (1) 248 sites with construction or demolition of buildings and (2) two sets of the same number of randomly selected control cases without such activity. It is found that the deep learning-based semantic segmentation can detect nearly 75% of the construction or demolition sites examined, while screening out over 60% of the control cases. The results suggest that it is particularly effective in detecting changes in the built environment with historical GSV images in areas with more buildings, less pavement, and larger-scale construction (or demolition) projects. False-positive outcomes, however, can emerge due to the imperfection of the deep learning model and the misalignment of GSV image points over years, showing some methodological challenges to be addressed in future research. PubDate: 2024-10-01 DOI: 10.1007/s10109-023-00420-1
- Introduction to the special issue on spatial machine learning
-
Free pre-print version: Loading...
Rate this result:
What is this?
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Abstract While, many of the machine learning (ML) and artificial intelligence (AI) methods that are now commonly being used to answer questions across scientific disciplines have been around for some time, their widespread application to spatial data and spatially-explicit research questions is much more recent. The large number of excellent review papers and special issues in leading journals published in the last few years—which this issue of the Journal of Geographical Systems takes its place among—attest to the growing interest in the application and development of cutting-edge methodologies for spatial data. This editorial begins by proposing a new inclusive definition for spatial ML, then provides a brief overview of each of the six papers in this special issue, and ends with a suggestion of several possible directions for future research in spatial ML. PubDate: 2024-10-01 DOI: 10.1007/s10109-024-00452-1
- Point cluster analysis using weighted random labeling
-
Free pre-print version: Loading...
Rate this result:
What is this?
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Abstract This paper proposes a new method of point cluster analysis. There are at least three important points that we need to consider in the evaluation of point clusters. The first is spatial inhomogeneity, i.e., the inhomogeneity of locations where points can be located. The second is aspatial inhomogeneity, which indicates the inhomogeneity of point characteristics. The third is an explicit representation of the geographic scale of analysis. This paper proposes a method that considers these points in a statistical framework. We develop two measures of point clusters: local and global. The former permits us to discuss the spatial variation in point clusters, while the latter indicates the global tendency of point clusters. To test the method’s validity, this paper applies it to the analysis of hypothetical and real datasets. The results supported the soundness of the proposed method. PubDate: 2024-09-10 DOI: 10.1007/s10109-024-00447-y
- Implications for spatial non-stationarity and the neighborhood effect
averaging problem (NEAP) in green inequality research: evidence from three states in the USA-
Free pre-print version: Loading...
Rate this result:
What is this?
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Abstract Recent studies on green space exposure have argued that overlooking human mobility could lead to erroneous exposure estimates and their associated inequality. However, these studies are limited as they focused on single cities and did not investigate multiple cities, which could exhibit variations in people’s mobility patterns and the spatial distribution of green spaces. Moreover, previous studies focused mainly on large-sized cities while overlooking other areas, such as small-sized cities and rural neighborhoods. In other words, it remains unclear the potential spatial non-stationarity issues in estimating green space exposure inequality. To fill these significant research gaps, we utilized commute data of 31,862 people from Virginia, West Virginia, and Kentucky. The deep learning technique was used to extract green spaces from street-view images to estimate people’s home-based and mobility-based green exposure levels. The results showed that the overall inequality in exposure levels reduced when people’s mobility was considered compared to the inequality based on home-based exposure levels, implying the neighborhood effect averaging problem (NEAP). Correlation coefficients between individual exposure levels and their social vulnerability indices demonstrated mixed and complex patterns regarding neighborhood type and size, demonstrating the presence of spatial non-stationarity. Our results underscore the crucial role of mobility in exposure assessments and the spatial non-stationarity issue when evaluating exposure inequalities. The results imply that local-specific studies are urgently needed to develop local policies to alleviate inequality in exposure precisely. PubDate: 2024-09-04 DOI: 10.1007/s10109-024-00448-x
- Integrating big data with KNIME as an alternative without programming
code: an application to the PATSTAT patent database-
Free pre-print version: Loading...
Rate this result:
What is this?
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Abstract Accessing massive datasets can be challenging for users unfamiliar with programming codes. Combining Konstanz Information Miner (KNIME) and MySQL tools on standard configuration equipment allows for addressing this issue. This research proposal aims to present a methodology that describes the necessary configuration steps in both tools and the required manipulation in KNIME to transmit the information to the MySQL environment for further processing in a database management system (DBMS). In addition, we propose a procedure so that the use of this point-and-click software in research work can gain in reproducibility and, therefore, in credibility in the scientific community. To achieve this, we will use a big database regarding patent applications as a reference, the PATSTAT Global 2023, provided by the European Patent Office (EPO). As well known, patent data can be a valuable source for understanding innovation dynamics and technological trends, whether for studies on companies, sectors, nations or even regions, at aggregated and disaggregated levels. PubDate: 2024-09-03 DOI: 10.1007/s10109-024-00445-0
- JGS Editors’ choice article
-
Free pre-print version: Loading...
Rate this result:
What is this?
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
PubDate: 2024-08-07 DOI: 10.1007/s10109-024-00446-z
- Mobility deviation index: incorporating geographical context into analysis
of human mobility-
Free pre-print version: Loading...
Rate this result:
What is this?
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Abstract Many studies seek to study the relationship between socioeconomic factors and human mobility indicators. However, it is well documented that mobility levels are also driven by the geographical context where individual movement takes place. Here we test whether accounting for geographical context leads to new or different interpretations of human mobility behavior when studying associations with socioeconomic factors. Specifically, we define mobility deviation index as the relative level of observed mobility when compared to expected mobility for a specific location, where expected mobility accounts for geographical context. Our results highlight the significant role of context when interpreting spatial patterns of human mobility. We demonstrate that controlling for the effects of geographical context will substantially impact our interpretation of associations between measures of human mobility and socioeconomic variables. These results represent an important step in furthering our understanding of the role of place on human mobility patterns. PubDate: 2024-07-17 DOI: 10.1007/s10109-024-00444-1
- Speeding up estimation of spatially varying coefficients models
-
Free pre-print version: Loading...
Rate this result:
What is this?
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Abstract Spatially varying coefficient models, such as GWR (Brunsdon et al. in Geogr Anal 28:281–298, 1996 and McMillen in J Urban Econ 40:100–124, 1996), find extensive applications across various fields, including housing markets, land use, population ecology, seismology, and mining research. These models are valuable for capturing the spatial heterogeneity of coefficient values. In many application areas, the continuous expansion of spatial data sample sizes, in terms of both volume and richness of explanatory variables, has given rise to new methodological challenges. The primary issues revolve around the time required to calculate each local coefficients and the memory requirements imposed for storing the large hat matrix (of size \(n \times n\) ) for parameter variance estimation. Researchers have explored various approaches to address these challenges (Harris et al. in Trans GIS 14:43–61, 2010, Pozdnoukhov and Kaiser in: Proceedings of the 19th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, 2011; Tran et al. in: 2016 Eighth International Conference on Knowledge and Systems Engineering (KSE), IEEE, 2016; Geniaux and Martinetti in Reg Sci Urban Econ 72:74–85, 2018; Li et al. in Int J Geogr Inf Sci 33:155–175, 2019; Murakami et al. in Ann Am Assoc Geogr 111:459–480, 2020). While the use of a subset of target points for local regressions has been extensively studied in nonparametric econometrics, its application within the context of GWR has been relatively unexplored. In this paper, we propose an original two-stage method designed to accelerate GWR computations. We select a subset of target points based on the spatial smoothing of residuals from a first-stage regression, conducting GWR solely on this subsample. Additionally, we propose an original approach for extrapolating coefficients to non-target points. In addition to using an effective sample of target points, we explore the computational gain provided by using truncated Gaussian kernel to create sparser matrices during computation. Our Monte Carlo experiments demonstrate that this method of target point selection outperforms methods based on point density or random selection. The results also reveal that using target points can reduce bias and root mean square error (RMSE) in estimating \(\beta\) coefficients compared to traditional GWR, as it enables the selection of a more accurate bandwidth size. We demonstrate that our estimator is scalable and exhibits superior properties in this regard compared to the (Murakami et al. in Ann Am Assoc Geogr 111:459–480, 2020) estimator under two conditions: the use of a ratio of target points that provides satisfactory approximation of coefficients (10–20 % of locations) and an optimal bandwidth that remains within a reasonable neighborhood ( \(<\,\) 5000 neighbors). All the estimator of GWR with target pointsare now accessible in the R package mgwrsar for GWR and Mixed GWR with and without spatial autocorrelation, available on CRAN depository at https://CRAN.R-project.org/package=mgwrsar. PubDate: 2024-07-02 DOI: 10.1007/s10109-024-00442-3
- Rethinking the null hypothesis in significant colocation pattern mining of
spatial flows-
Free pre-print version: Loading...
Rate this result:
What is this?
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Abstract Spatial flows represent spatial interactions or movements. Mining colocation patterns of different types of flows may uncover the spatial dependences and associations among flows. Previous studies proposed a flow colocation pattern mining method and established a significance test under the null hypothesis of independence for the results. In fact, the definition of the null hypothesis is crucial in significance testing. Choosing an inappropriate null hypothesis may lead to misunderstandings about the spatial interactions between flows. In practice, the overall distribution patterns of different types of flows may be clustered. In these cases, the null hypothesis of independence will result in unconvincing results. Thus, considering the overall spatial pattern of flows, in this study, we changed the null hypothesis to random labeling to establish the statistical significance of flow colocation patterns. Furthermore, we compared and analyzed the impacts of different null hypotheses on flow colocation pattern mining through synthetic data tests with different preset patterns and situations. Additionally, we used empirical data from ride-hailing trips to show the practicality of the method. PubDate: 2024-05-03 DOI: 10.1007/s10109-024-00439-y
- Unveiling the impact of machine learning algorithms on the quality of
online geocoding services: a case study using COVID-19 data-
Free pre-print version: Loading...
Rate this result:
What is this?
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Abstract In today's era, the address plays a crucial role as one of the key components that enable mobility in daily life. Address data are used by global map platforms and location-based services to pinpoint a geographically referenced location. Geocoding provided by online platforms is useful in the spatial tracking of reported cases and controls in the spatial analysis of infectious illnesses such as COVID-19. The first and most critical phase in the geocoding process is address matching. However, due to typographical errors, variations in abbreviations used, and incomplete or malformed addresses, the matching can seldom be performed with 100% accuracy. The purpose of this research is to examine the capabilities of machine learning classifiers that can be used to measure the consistency of address matching results produced by online geocoding services and to identify the best performing classifier. The performance of the seven machine learning classifiers was compared using several text similarity measures, which assess the match scores between the input address data and the services' output. The data utilized in the testing came from four distinct online geocoding services applied to 925 addresses in Türkiye. The findings from this study revealed that the Random Forest machine learning classifier was the most accurate in the address matching procedure. While the results of this study hold true for similar datasets in Türkiye, additional research is required to determine whether they apply to data in other countries. PubDate: 2024-01-25 DOI: 10.1007/s10109-023-00435-8
- Analysis of a spatial point pattern in relation to a reference point
-
Free pre-print version: Loading...
Rate this result:
What is this?
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Abstract This paper develops a new method for analyzing the relationship between a set of points and another single point, the latter of which we call a reference point. This relationship has been discussed in various academic fields, such as geography, criminology, and epidemiology. Analytical methods, however, have not yet been fully developed, which has motivated this paper. Our method reveals how the number of points varies by the distance from a reference point and by direction. It visualizes the spatial pattern of points in relation to a reference point, describes the point pattern using mathematical models, and statistically evaluates the difference between two sets of points. We applied the proposed method to analyze the spatial pattern of the climbers of Mt. Azuma, Japan. The result gave us useful and interesting findings, indicating the method’s soundness. PubDate: 2024-01-19 DOI: 10.1007/s10109-023-00434-9
|