Industry Perspective: Tree-Based Models vs Deep Learning for Tabular Data
For tabular data, gradient boosted trees (GBTs) perform better than neural networks (NNs). This is common knowledge amongst data scientists in industry, and more recently evidenced in a comparative study. But performance isn't the primary reason why GBTs are preferred for tabular prediction tasks. GBTs...
- ...can handle missing data without imputation
- ...can handle (high cardinality) categorical data without preprocessing
- ...tolerate any distribution of data, regardless of whether it's skewed, heteroscedastic, or multimodal
- ...are simple to regularise to prevent overfitting
- ...have a smaller hyperparameter space than NNs. Default GBT hyperparameters will likely produce good results; a bespoke set of tuned parameters will likely transfer well across similar datasets for a given use case.
- ...are faster to train and require less compute than NNs
- ...are easier to explain and interpret than NNs. SHAP values can be determined efficiently for GBTs, but not NNs.
- ...have large communities of industry practitioners, mostly using a handful of mature implementations, sharing best practices and advice
Data isn't inherently tabular
This article isn't about why GBTs outperform NNs on tabular data - you should read the paper for that. Something that often gets overlooked, though, is that data isn't inherently tabular: tables are just an extremely simple and common way of storing information. If you elect to structure your data as a different representation, you might flip the performance of these algorithms, and unlock the power of NNs.
As a contrived example: you can store a set of 256x256 pixel, greyscale images in a table, where each row corresponds to an image, and 65,536 columns capture the pixel intensities. On this data, GBTs might outperform NNs. However, if you fed these images into a convolutional NN as 2D arrays, you would achieve vastly superior results.
When deep learning makes sense
If your data can be restructured meaningfully in a non-tabular way, and improving performance matters, you may want to consider NNs if you're willing and able to...
- ...rewrite your entire ETL pipeline
- ...preprocess your data extensively
- ...train and tune finickity NN-based algorithms, and carefully monitor their inputs and outputs
- ...increase your spend on computational resources
- ...live with a fragile "black box" model
None of this sounds very attractive, but there are plenty of use cases where even slightly besting the performance you can eek from GBTs will justify the increased cost and complexity.
An industry case study
At a previous employer of mine, IQVIA, we trained GBTs on multimillion-patient scale, real-world healthcare data, to identify patients with undiagnosed rare diseases.
The data was wide, sparse, and tabular, with domain-specific features engineered using bespoke in-house tooling. These features were based on medically-relevant clinical codes, capturing things like the counts of important events in a patient's medical history, and the times since their last occurrence.
We trained and evaluated GBTs on this tabular data using a rolling cross section methodology. Doing the same with NNs did not yield better results.
An alternative approach would be to structure the data as sequences. Rather than having tabulated features like count_of_x and time_since_last_x for each clinically relevant diagnosis or procedure, each patient's data could be structured temporally as a timeline of events.
Would it be worth it to rearchitect the entire pipeline to use neural networks with sequential data? In this case, no.
For the questions we were answering, the performance of GBTs was good enough. The unguaranteed performance improvement that NNs may have provided wouldn't justify the risk and considerable effort required to retool years' worth of development. The opportunity cost would be huge, consuming resources that could be invested into improving the service in other ways, or maximising the delivery of the existing solution.
Conclusion
As the field of deep learning advances, NNs may eventually become state-of-the-art on tabular data. In fact, I wouldn't be surprised if they already are, behind the closed doors of algorithmic trading firms.
GBTs will still remain an attractive option in industry, due to the plethora of advantages they offer. Predictive performance is only one aspect of a machine learning system: for NNs to see broader application on tabular data, they will also need to improve in other ways. Every design choice is a tradeoff, and maximising performance needs to be balanced with building a robust and maintainable system.