Adding neural networks to the flight prediction model

If you saw my previous post, you may have noticed that I chose to use a regression to build the model for a flight prediction model. But regressions, of course, are inherently limited by predicting future events by fitting past events to some sort of standard function. Artificial neural networks are very interesting because it has been shown that (at least theoretically) they can fit any continuous function, so I wanted to give this method a try in my model!

I've updated the visualization from my last post to include two methods of prediction for any route: the same regression-based model from the last post in addition to a neural network based prediction. I'll briefly discuss my methodology and compare the two methods but check out the visualization below and new model first -- it works just like the previous map, but this time will show the two predictions side-by-side.

Before comparing the results, I'll quickly discuss my methodology (feel free to take a look at the Jupyter notebook here). I used the same variables as the regression, and similarly limited the model to routes with less than 35,000 passengers in the quarter. I scaled the data using a standard max-min algorithm and split the data into a training and testing set. To actually build the artificial neural network, I used TensorFlow and Keras in Python. I tried a variety of different models with different sizes, activation functions, and optimizers before landing on the model shown below based on optimal and reproducible results. As you can see in the table below, it has four hidden, dense layers, each with 16 or 64 neurons.

                                    _________________________________________________________________
                                    Layer (type)                 Output Shape              Param #   
                                    =================================================================
                                    dense_1011 (Dense)           (None, 16)                160       
                                    _________________________________________________________________
                                    dense_1012 (Dense)           (None, 16)                272       
                                    _________________________________________________________________
                                    dense_1013 (Dense)           (None, 64)                1088      
                                    _________________________________________________________________
                                    dense_1014 (Dense)           (None, 16)                1040      
                                    _________________________________________________________________
                                    dense_1015 (Dense)           (None, 1)                 17        
                                    =================================================================
                                    Total params: 2,577
                                    Trainable params: 2,577
                                    Non-trainable params: 0
                                    _________________________________________________________________
                                    None

The neural network converged at a result within about 75 epochs and had a testing mean square error (MSE) of 4454136. I then used the same data to re-run the regression, which had a testing MSE of 4807240, indicating an approximately 10% improvement using the neural network model. Of course, a lower MSE doesn't necesarily mean better results -- it's all about whether they make sense in context, so we'll compare results for ten randomly chosen routes and see how each model does as a small sample. As a note, the neural network will never give an answer lower than 35 because that seemed to be a global minimum for this neural network (I'm still thinking about why), which is noted in the model.

To look at results in context, I chose two airports using a random number generator ten times and then considered the results, shown below.

Origin	Destination	Nonstop	Total	Regression	Neural Network	Comments
RDU (Raleigh-Durham, NC)	MDT (Harrisburg, PA)	0	15	56	85	This seems like a lot of passengers given that the total PDEW is only 15. The neural network in particular thinks seats would fill simply because the route exists, more so than the regression.
PIT (Pittsburg, PA)	LAS (Las Vegas, NV)	293	405	230	203	This route actually exists! The regression is slightly more accurate than the neural network here.
PBI (Palm Beach, FL)	GRK (Killeen, TX)	0	1	11	67	As before, the regression assumes that this route connecting two small airports will have very few passengers, whereas the neural network thinks it will fill somewhat simply because it exists. While hard to predict what actually might happen if launched, it's hard to believe only 11 people getting on board a plane every day for a flight, but 67 might be too high.
ORD (Chicago, IL)	FAR (Fargo, ND)	51	57	155	149	Both the regression and neural networks overestimate this route by the same magnitude, perhaps because of ORD's sheer size.
RNO (Reno, NV)	FAY (Fayetteville, NC)	0	1	5	48	The neural network predicts about 50 passengers whereas the regression predicts only 5 PDEW -- as before, the distinction between the models appears to be whether the sheer existence of the route will increase demand
DFW (Dallas, TX)	SEA (Seattle, WA)	953	1034	852	1011	Another existing route, tthe neural network gets slightly closer to the actual nonstop PDEW.
LSE (La Crosse, WI)	GTR (Columbus, MS)	0	0	2	<35	Both models predict extremely low traffic on this pretty obscure route.
ELP (El Paso, TX)	FCA (Kalispell, MT)	0	2	21	68	As before, we see higher estimates with the neural network than for the regression despite low observed traffic.
DTW (Detroit, MI)	STX (St. Croix, USVI)	0	5	40	114	We see similar results as before.
OKC (Oklahoma City, OK)	MSP (Minneapolis, MN)	55	72	111	112	We see almost exactly the same results (both overestimates) from the two models.

There are two broad conclusions we can draw from this comparison:

The two models return relatively similar results for routes that are likely to be fairly well-trafficked.
For lower demand routes, the neural network consistently expects signifigantly more passengers than the regression. This likely results from how the two different models operate. The neural network seems to assume that should a route be launched, we will see increased demand simply because the route exists despite having small airports on the route, compared to the regression model, which would simply predict near-zero passengers per day based on the fact that the airports have very small traffic.

As a result, the neural network may be more useful in situations (particularly for smaller routes) where we expect signifigant demand to be created because of the route's existence, for example drawing passengers away from other airports (a new route from Providence may steal passengers away from Boston or Hartford, for example). Additionally, the neural network model has a lower overall MSE, which indicates a slightly better model as well. Both models, of course, are useful in considering potential new routes and have different strengths and weaknesses, and I look forward to further exploring the benefits of each.

As always, let me know if you have any thoughts or questions!

Blog

Adding neural networks to the flight prediction model