[Data Science Project] What Drives Intra-State Migration? A Network Regression Analysis on Florida Counties

Introduction

Understanding internal migration patterns within a state offers deep insights into social equity, economic opportunity, and regional planning. In this blog post, I analyze county-to-county migration flows in Florida between 2016 and 2020 using network-based regression methods. As this project is an extended project from last post, so if you are interested in this intra-Florida migration analysis, please refer to this link. Specifically, I applied the Multiple Regression Quadratic Assignment Procedure (MR-QAP) to identify which socioeconomic and spatial factors influence migration ties.

Data Overview

  • Migration Flows: Origin-destination data among Florida counties (2016–2020), filtered to include only flows with weight > 0.

  • County Attributes: Socioeconomic indicators like median income, racial composition, education, employment, commute time, and housing age from the U.S. Census.

  • Geometry: Florida county shapefiles for spatial network construction.

Each county is treated as a node, and migration flow is a weighted edge in a directed graph.

Methodology: MR-QAP

MR-QAP is a network regression model that accounts for the autocorrelated structure of network data. Here’s what we did:

  1. Dependent Matrix: Migration flow matrix between 67 Florida counties.

  2. Independent Matrices: Pairwise absolute differences between counties in:

  • Median income

  • % White population

  • % African-American population

  • Degree centrality (connectedness)

  • % Employed

  • Average commute time

  • Median year built of housing

  • County area

  • Geographic distance between centroids

  1. Permutation Testing: 1000 permutations were run to compute p-values.

Results

Variable Coefficient (β) p-value
Geographic Distance -39.0901 0.0080
Median Income Difference -0.0017 0.4186
% White Difference -2.3064 0.3477  
% African-American Diff. +5.5234 0.0959
Degree Centrality Diff. +0.0061 0.0180
Employment Rate Diff. -5.2510 0.8112
Commute Time Difference +9.2089 0.2907
Median Built Year Diff. +7.7797 0.1249
Area Difference +0.2053 0.0180
  • R² (model): 0.0183

  • R² (mean of permutations): 0.0052

Interpretation

  • Distance matters. Counties farther apart are less likely to be connected by migration.

  • Network centrality matters. Counties with similar migration importance (degree) tend to be connected.

  • Area size also shows a significant but small positive effect.

  • Income, race, and commute time show weaker or insignificant effects, suggesting spatial structure dominates over socioeconomic similarity.

Visualization

Permutation Distribution

The observed R² of 0.0183 lies in the upper tail of the permutation distribution, confirming the model’s modest but non-random explanatory power.

Conclusion

This MR-QAP analysis reveals that physical proximity and network centrality are stronger predictors of county-to-county migration flows in Florida than demographic or economic similarity. This finding aligns with gravity-based theories of migration and has implications for infrastructure and service planning at the state level.

Thank you for reading!

비모

Want to see more posts like this combining Python, mapping, and regression modeling? Hit like and subscribe to follow my data science journey!

Categories:

Updated: