An Empirical Study on Used Car Price Prediction Using Supervised and Unsupervised Learning
DOI:
https://doi.org/10.54097/74v0kt12Keywords:
Used Car Price Prediction, Machine Learning, XGBoost, Random Forest, Data Preprocessing, Feature EngineeringAbstract
This project focuses on predicting used car prices using a combination of data preprocessing, unsupervised learning, and advanced supervised learning techniques [1, 2]. The goal was to develop an accurate model for price prediction by exploring patterns in vehicle features and leveraging robust machine learning methodologies. The dataset was meticulously cleaned and enhanced by imputing missing values using a random forest-based imputation method [3], splitting multi-dimensional features such as engine specifications, and categorizing variables like brand and transmission types. Principal Component Analysis (PCA) was employed to reduce dimensionality, retaining 95% of the dataset's variance, and unsupervised clustering algorithms, including K-Means, K-Modes, and hierarchical clustering, identified meaningful groupings that provided insights into vehicle segmentation. For supervised learning, we implemented and compared multiple models, including Elastic Net regression, Random Forest, Support Vector Machines, and XGBoost. The XGBoost model demonstrated superior performance with an R² of 0.87 and a MAPE of 21.07%, effectively capturing non-linear relationships [1, 2]. Key predictors, including mileage, horsepower, model year, and brand grouping, provided important prediction power into price drivers. In the open-ended question part, we estimated the original prices of cars as if they were brand new using Random Forest and XGBoost models, adjusting attributes such as mileage and model year to simulate new conditions. Since our predicted prices closely fit with official release prices, so we successfully used machine learning techniques to achieve accurate predictions for car prices.
Downloads
References
[1] Hankar, M., Birjali, M., & Beni-Hssane, A. (2022). Used Car Price Prediction using Machine Learning: A Case Study. 2022 11th International Symposium on Signal, Image, Video and Communications (ISIVC), 1–4. https://objects.scraper.bibcitation.com/user-pdfs/2024-12-12/849adcb3-3f87-41e3-b337-290735fd f4f1.pdf
[2] Gegic, E., Isakovic, B., Keco, D., Masetic, Z., & Kevric, J. (2019). Car Price Prediction using Machine Learning Techniques. TEM Journal, 8(1), 113–118. https://doi.org/10.18421/tem81-16
[3] Stekhoven, D. J., & Bühlmann, P. (2011). MissForest—non-parametric missing value imputation for mixed-type data. Bioinformatics, 28(1), 112–118. https://doi.org/10.1093/bioinformatics/btr597
Downloads
Published
Issue
Section
License
Copyright (c) 2025 Journal of Computer Science and Artificial Intelligence

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.








