Understanding transit ridership in an equity context through a comparison of statistical and machine learning algorithms (original) (raw)

Inferring Socioeconomic Characteristics from Travel Patterns

Journal of Regional and City Planning

Nowadays, crowd-based big data is widely used in transportation planning. These data sources provide valuable information for model validation; however, they cannot be used to estimate travel demand forecasting models, because these models need a linkage between travel patterns and the socioeconomic characteristics of the people making trips and such a connection is not available due to privacy issues. As such, uncovering the correlation between travel patterns and socioeconomic characteristics is crucial for travel demand modelers to be able to leverage such data in model estimation. Different age, gender, and income groups may have specific travel behavior preferences. To extract and investigate these patterns, we used two data sets: one from the National Household Travel Survey 2009 and the other from the Metropolitan Washington Council of Government Transportation Planning Board 2007-2008 household survey. After preprocessing the data, a range of machine learning algorithms were...

Modeling commuter's sociodemographic characteristics to predict public transport usage frequency by applying supervised machine learning method

Research article , 2019

Predictive modeling is the key fundamental method to study passengers' behavior in transportation research. One of the limited studied topic is modeling of public transport usage frequency, which can be used to estimate present and future demand and users' trend toward public transport services. The artificial intelligence and machine learning methods are promising to be better substitute to statistical techniques. No doubt, traditionally been used econometrics models are better for causal relationship studies among variables, but they made rigid assumptions and unable to recognize the pattern in data. This paper aims to build a predictive model to solve passengers' classification, and public transport usage frequency using socio-demographic survey data. The supervised machine learning algorithm, K-Nearest Neighbor (KNN) applied to build a predictive model, which is the better machine learning method for dealing with small datasets, because of its ability of having less parameter tuning. Survey data has been used to train and validate the model performance, which is able to predict public transport usage frequency of future users of public transport. This model can practically be used by public transport agencies and relevant government organizations to predict the public transport demand for new commuters before introducing any new transportation projects.

Reconstructing commuters network using machine learning and urban indicators

Scientific Reports, 2019

Human mobility has a significant impact on several layers of society, from infrastructural planning and economics to the spread of diseases and crime. Representing the system as a complex network, in which nodes are assigned to regions (e.g., a city) and links indicate the flow of people between two of them, physics-inspired models have been proposed to quantify the number of people migrating from one city to the other. Despite the advances made by these models, our ability to predict the number of commuters and reconstruct mobility networks remains limited. Here, we propose an alternative approach using machine learning and 22 urban indicators to predict the flow of people and reconstruct the intercity commuters network. Our results reveal that predictions based on machine learning algorithms and urban indicators can reconstruct the commuters network with 90.4% of accuracy and describe 77.6% of the variance observed in the flow of people between cities. We also identify essential features to recover the network structure and the urban indicators mostly related to commuting patterns. As previously reported, distance plays a significant role in commuting, but other indicators, such as Gross Domestic Product (GDP) and unemployment rate, are also driven-forces for people to commute. We believe that our results shed new lights on the modeling of migration and reinforce the role of urban indicators on commuting patterns. Also, because link-prediction and network reconstruction are still open challenges in network science, our results have implications in other areas, like economics, social sciences, and biology, where node attributes can give us information about the existence of links connecting entities in the network.

Non-Linear Associations Between the Urban Built Environment and Commuting Modal Split: A Random Forest Approach and SHAP Evaluation

IEEE Access

The study of commuting mode choice is crucial since driving, with all its associated environmental and economic consequences, is the United States' most popular mode of transportation due to urban sprawl, priority to road construction and America's love affair with the automobile. More attention needs to be paid to sustainable modes such as public transit and walking. The built environment is expected to have an impact on commuting mode choice. Built environments with higher density, diversity, intentional design, destination accessibility, and shorter distance to transit (collectively known as the 5 Ds of the built environment) are hypothesized to lead to more sustainable mode choices, including public transit and walking. In this paper, we evaluate the impact of built environment variables on commuting modal split, including the four modes of public transit-bus, public transit-rail, walking, and driving. The study is conducted in Mecklenburg County, North Carolina, at the geographic level of census block groups in year 2015. Given the complexity of relationships in the built environment-travel behavior subject, the random forest method is used to predict aggregated commuting mode choice. Random forest is employed as it is capable of capturing nonlinear relationships and is not constrained by limitations in other widely used methods, such as multinomial logistic regression. After predicting the commuting mode shares, SHAP values (SHapley Additive exPlanations) are used to evaluate the impact of the built environment on commuting mode choices. As an advanced machine learning method, SHAP values adds explainability to the model. This method resolves the known limitation of machine learning methods as being ''black boxes'' and converts them to ''white boxes'' by providing interpretability. They provide insights into both the direction and magnitude of the relationships. Thanks to its rigorous ML-based design, our study helps to solidify the state of knowledge with strong evidence that block groups with higher degrees of the 5Ds lead to more choices of public transit and walking modes. We discuss urban policy implications of this study.

Machine Learning for Prediction of Mid to Long Term Habitual Transportation Mode Use

2019

Prediction of daily transportation mode use (car, public transit, or active travel) is a important task in transportation research. Unlike statistical models that impose a predetermined model structure, machine learning models are learned from the data, making them more flexible with higher prediction accuracy. However, prediction of mid-to long-term habitual modes still largely relies on traditional statistical analysis using small samples of cross-sectional data. Low interpretability of "black-box" machine learning models limits their usefulness for generating behavior insights needed for designing appropriate interventions. This paper, leveraging a set of unique longitudinal life course data, is the first use case to demonstrate machine learning methods applied for both predicting and interpreting regularly used travel modes. We combine sequence clustering and tree-based machine learning methods coupled with TreeExplainer to predict and interpret habitual travel modes using mid-to long-term predictors. Five life course clusters are derived to provide evaluation and interpretation contexts. This allows us to improve upon a recently developed TreeExplainer method to better distinguish predictor importance locally and globally; and predictor interactions across subpopulations within distinctive life history contexts. Our results demonstrate a promising step toward interpretable machine learning applications to mid-to long-term prediction of travel modes for transportation planning.

Investigating machine learning for simulating urban transport patterns: A comparison with traditional macro-models

Predicting passenger flow within a city is crucial for intelligent transportation management systems, especially in the context of urban development, post-pandemic policy changes, and infrastructure improvements. Traditional macro models have limitations in accurately capturing the complex structure of real traffic flows, and recent advancements in machine learning offer promising approaches for improving transportation simulations. This research aims to compare the effectiveness of traditional simulation models with a selective machine learning (ML) model for traffic flow prediction in Oslo, Norway. Sensitivity and scenario analyses are conducted to examine the models' parameters and derive the city's characteristics. Results substantiate that the traditional Spatial Interaction model (SIM), although interpretable and requiring fewer parameters, has limitations in accurately capturing real flow structures and exhibits greater variability compared to the ML model. Statistical analyses support these findings and raise questions about the validity of the ML model's results over the SIM. The research highlights the potential of ML models to identify trends in passenger flows and simulate traffic flows in different scenarios related to city development. Overall, the research presents a decision support system for planners and policymakers to predict traffic flow accurately and efficiently. It highlights the benefits and drawbacks of both the traditional SIM and ML models, contributing to the ongoing discussion of the role of machine learning in transportation modeling.

Who Can Access Transit? Reviewing Methods for Determining Population Access to Bus Transit

SpAM (Spatial Analysis and Methods) presents short articles on the use of spatial statistical techniques for housing or urban development research. Through this department of Cityscape, the Office of Policy Development and Research introduces readers to the use of emerging spatial data analysis methods or techniques for measuring geographic relationships in research data. Researchers increasingly use these new techniques to enhance their understanding of urban patterns but often do not have access to short demonstration articles for applied guidance. If you have an idea for an article of no more than 3

Exploring Bus Rapid Transit passenger travel behaviour using big data

2014

Over the past two decades, a growing international trend has been the implementation of Bus Rapid Transit (BRT) as a cost-effective way to enhance urban public transport (UPT) service quality and progress towards sustainable urban transport. To date, in excess of 40 cities worldwide operate BRT within their UPT networks. Despite the international prominence of BRT systems, there has been limited in-depth investigation of their spatial-temporal dynamics. Drawing on a case study BRT system, Brisbane, Australia, we apply a geo-visualisation-based method to a large smart card database to examine spatialtemporal dynamics. The conditional flow-maps (or flow-comaps) are created to visually compare the spatial trajectories of BRT trips and other bus-based trips and their variation by calendar events (i.e., a workday, a weekend, a school holiday and a public holiday). The results highlight (1) marked differences between BRT-based trips to those bus trips undertaken on the broader UPT network; (2) spatial heterogeneity in BRT trips; and (3) the potential of drawing on 'big data' to support evidence-based BRT planning. These findings render important implications with the capacity to inform future BRT strategy as it relates to service management and infrastructure expansion.

Can transit investments in low-income neighbourhoods increase transit use? Exploring the nexus of income, car-ownership, and transit accessibility in Toronto

Transportation Research Part D: Transport and Environment, 2021

Abstract Transportation equity advocates recommend improving public transit in low-income neighbourhoods to alleviate socio-spatial inequalities and increase quality of life. However, transportation planners often overlook transit investments in neighbourhoods with “transit-captive” populations because they are assumed to result in less mode-shifting, congestion relief, and environmental benefits, compared to investments that aim to attract choice riders in wealthier communities. In North American cities, while many low-income households are already transit users, some also own and use private vehicles. It suggests that transit improvements in low-income communities could indeed result in more transit use and less car use. Accordingly, the main objective of this article is to explore the statistical relationship between transit use and transit accessibility as well as how this varies by household income and vehicle ownership in the Greater Toronto and Hamilton Area (GTHA). Using stratified regression models, we find that low-income households with one or more cars per adult have the most elastic relationship between transit accessibility and transit use; they are more likely to be transit riders if transit improves. However, we confirm that in auto-centric areas with poor transit, the transit use of low-income households drops off sharply as car ownership increases. On the other hand, a sensitivity analysis suggests more opportunities for increasing transit ridership among car-deficit households when transit is improved. These findings indicate that improving transit in low-income inner suburbs, where most low-income car-owning households are living, would align social with environmental planning goals.