Comparative evaluation of hierarchical clustering and k-means for strategic customer segmentation in commercial companies
Abstract
In today's business environment, customer segmentation is a key resource for optimizing both decision-making and the design of business strategies based on data analysis. In this context, clustering techniques have gained relevance because they allow for the identification of behavioral patterns and the grouping of customers based on their consumption habits. This study aims to comparatively evaluate the performance of hierarchical clustering and K-means for the strategic segmentation of customers in companies in the retail sector, considering purchase frequency, transaction value, and customer tenure as variables. The research employed a quantitative approach based on data analytics, beginning with the preprocessing and normalization of the variables. Subsequently, the optimal number of clusters was determined using the elbow method. Hierarchical clustering was used for the exploratory diagnosis of the data structure, followed by the K-means algorithm for the final classification of homogeneous groups. Additionally, the Silhouette index was used as an internal validation metric to examine the quality of the resulting clusters. The results revealed the existence of customer segments with distinct behavioral patterns, distinguishing between frequent and occasional consumers. In conclusion, the combined implementation of these techniques enables the creation of reliable and consistent segmentations that support strategic decision-making and the development of personalized marketing tactics.
Downloads
References
Abdel-Hakim, A. E., Ibrahim, A.-M. M., Bouazza, K. E., Deabes, W., & Hedar, A.-R. (2024). Ellipsoidal K-Means: An Automatic Clustering Approach for Non-Uniform Data Distributions. Algorithms, 17(12), 551. https://doi.org/https://doi.org/10.3390/a17120551
Afzal, A., Khan, L., Hussain, M. Z., Hasan, M. Z., Mustafa, M., Khalid, A., . . . Javaid, A. (2024). Customer Segmentation Using Hierarchical Clustering. 2024 IEEE 9th International Conference for Convergence in Technology (I2CT) (págs. 1-6). Pune, India: IEEE. https://doi.org/https://doi.org/10.1109/I2CT61223.2024.10543349
Alonso, J., Largo, M., & Hoyos, C. (2024). Una introducción a los modelos de Clustering empleando R. Universidad Icesi. https://doi.org/https://doi.org/10.18046/EUI/bda.h.6
Alves Gomes, M., & Meisen, T. (2023). A review on customer segmentation methods for personalized customer targeting in e-commerce use cases. Information Systems and e-Business Management, 21, 527-570. https://doi.org/https://doi.org/10.1007/s10257-023-00640-4%0A
Ayala-Aldana, N., & Parra-Cid, C. (2023). Algoritmos de agrupamiento jerárquico para el control de la susceptibilidad antibiótica. Revista Chilena de Enfermedades Respiratorias, 39(1), 120-121. Obtenido de https://revchilenfermrespir.cl/index.php/RChER/article/view/1124
Bermúdez León, M. (2020). Normalización. Universidad San Marcos. Obtenido de https://repositorio.usam.ac.cr/xmlui/bitstream/handle/11506/2125/LEC%20ING%20SIST%200069%202020.pdf?sequence=1&isAllowed=y
Camargo Morales, F., Angarita López, J., & Najar Sánchez, O. (2023). Sistematización de conceptos de marketing con realidad aumentada. Ediciones de la U. Obtenido de https://books.google.com.ec/books?id=69jDEAAAQBAJ&pg=PA88&hl=es&source=gbs_selected_pages&cad=1#v=onepage&q&f=true
Chicco, D., Campagner, A., Spagnolo, A., Ciucci, D., & Giuseppe, J. (2025). The Silhouette coefficient and the Davies-Bouldin index are more informative than Dunn index, Calinski-Harabasz index, Shannon entropy, and Gap statistic for unsupervised clustering internal evaluation of two convex clusters. PeerJ Computer Science, 11, e3309. https://doi.org/I 10.7717/peerj-cs.3309
Font, X. (2019). Técnicas de clustering. FUOC. Obtenido de https://openaccess.uoc.edu/server/api/core/bitstreams/859ca353-d4f7-4448-a284-6454decfc950/content
Franco-Árcega, A., Sobrevilla-Sólis, V., Gutiérrez-Sánchez, M., García-Islas, L., Suárez-Navarrete, A., & Rueda-Soriano, E. (2021). Sistema de enseñanza para la técnica de agrupamiento k-means. Padi Boletín Científico de Ciencias Básicas e Ingenierías del ICBI, 9(Especial), 53-58. https://doi.org/https://doi.org/10.29057/icbi.v9iEspecial.7384
Guañuna Viteri, N. (2022). Clasificación de los productos de una empresa de Quito considerando el recurso tiempo de mano de obra asignado a cada uno de sus procesos durante el año 2021-2022 utilizando algoritmos de aprendizaje no supervisado. Obtenido de [Tesis de maestría, Pontificia Universidad Católica del Ecuador]: https://repositorio.puce.edu.ec/server/api/core/bitstreams/268b664d-220d-468b-a8d3-a68d5469976f/content
Han, W., Zhang, S., Gao, H., & Bu, D. (2024). Clustering on hierarchical heterogeneous data with prior pairwise relationships. BMC Bioinformatics, 25(40), 1-22. https://doi.org/https://doi.org/10.1186/s12859-024-05652-6
Kopřivová, V., & Matušínská, K. (2023). Unlocking Generation Y: Market Segmentation via Lifestyle Insights. Communication Today, 14(2), 122-139. https://doi.org/https://doi.org/10.34135/communicationtoday.2023.Vol.14.No.2.9
Marrero, L., Carrizo, D., García-Santander, L., & Ulloa-Vasquez, F. (2021). Uso de algoritmo K-means para clasificar perfiles de clientes con datos de medidores inteligentes de consumo eléctrico: Un caso de estudio. Ingeniare. Revista chilena de ingeniería, 29(4), 778-787. https://doi.org/http://dx.doi.org/10.4067/S0718-33052021000400778
Moharana, U., & Sarmah, S. (2018). Joint replenishment of associated spare parts using clustering approach. The International Journal of Advanced Manufacturing Technology, 94(1), 2535-2549. https://doi.org/https://doi.org/10.1007/s00170-017-0909-6
Mussabayev, R. (2024). Optimizing Euclidean Distance Computation. Mathematics, 12(23), 3787. https://doi.org/https://doi.org/10.3390/math12233787
Naoui, M. A., Brahim, L., & Mouloud, A. (2020). Usando el algoritmo K-means para la curva de regresión en un gran sistema de datos para el entorno empresarial. Revista Cubana de Ciencias Informáticas, 14(2), 34-48. Obtenido de https://rcci.uci.cu/index.php/RCCI/article/view/1909
Navarrete Vinces, M., Alegría Camino, D., Galarza Luna, A., & Ramírez Garofalo, D. (2025). La Inteligencia Artificial y el aprendizaje automático en la Educación Superior del Ecuador. Tesla Revista Científica, 5(2), e515. https://doi.org/https://doi.org/10.55204/trc.v5i2.e515
Oti, E. U., & Olusola, M. O. (2024). Overview of Agglomerative Hierarchical Clustering Methods. British Journal of Computer, Networking and Information Technology, 7(2), 14-23. https://doi.org/https://www.doi.org/10.52589/BJCNIT-CV9POOGW
Plazas Niño, F. (2021). Introducción al análisis clúster: Una aplicación en la clasificación de campos petroleros. Obtenido de https://www.researchgate.net/publication/350936275_INTRODUCCION_AL_ANALISIS_CLUSTER_UNA_APLICACION_EN_LA_CLASIFICACION_DE_CAMPOS_PETROLEROS
Prieto Herrera, J. E. (2021). Investigación de mercados (Tercera ed.). ECOE. Obtenido de https://books.google.com.ec/books?id=tQpZEAAAQBAJ&printsec=frontcover&source=gbs_atb&redir_esc=y#v=onepage&q&f=false
Ramírez-Valverde, G., & Ramírez-Valverde, B. (2022). Programa estadístico R, Herramienta clave en el análisis y visualización de datos. Agro-Divulgación, 2(2), 17-22. Obtenido de https://agrodivulgacion-colpos.org/index.php/1agrodivulgacion1/article/view/59
Rousseeuw, P. J. (1987). Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. Journal of Computational and Applied Mathematics, 20, 53-65. https://doi.org/https://doi.org/10.1016/0377-0427(87)90125-7
Saxena, A., Agarwal, A., Pandey, K. B., & Pandey, D. (2024). Examination of the Criticality of Customer Segmentation Using Unsupervised Learning Methods. Circular Economy and Sustainability, 4, 1447-1460. https://doi.org/https://doi.org/10.1007/s43615-023-00336-4
Shi, C., Wei, B., & Wei, S. (2021). A quantitative discriminant method of elbow point for the optimal number of cluster in clustering algorithm. EURASIP Journal on Wireless Communications and Networking, 2021(31). https://doi.org/https://doi.org/10.1186/s13638-021-01910-w
Singh, M., Singh, A., Gupta, M., & Reddy, R. (2022). Leveraging K-Means Clustering and Hierarchical Agglomerative Algorithms for Scalable AI-Driven Customer Segmentation. Journal of AI ML Research, 11(7), 1-23. Obtenido de https://www.joaimlr.com/index.php/v1/article/view/31
Stack Sánchez, P. (2021). Métodos de aprendizaje supervisado y no supervisado para la estimación de microestructura cerebral en datos de DWMR. Obtenido de [Tesis de maestría, Centro de Investigación en Matemáticas, A.C.]: https://cimat.repositorioinstitucional.mx/jspui/bitstream/1008/1129/1/TE%20835.pdf
Tabianan, K., Velu, S., & Ravi, V. (2022). K-Means Clustering Approach for Intelligent Customer Segmentation Using Customer Purchase Behavior Data. Sustainability, 14(12), 7243. https://doi.org/https://doi.org/10.3390/su14127243
Tan, P.-N., Steinbach, M., Karpatne, A., & Kumar, V. (2019). Introduction to Data Mining. Pearson Education. Obtenido de https://api.pageplace.de/preview/DT0400.9780273775324_A37747616/preview-9780273775324_A37747616.pdf
Trujillo Logroño, K. (2025). Análisis del proceso de clustering con K-means para la segmentación de clientes en la empresa de Rastreo 365. Obtenido de [Tesis de maestría, Universidad Tecnológica Indoamérica]: https://repositorio.uti.edu.ec/server/api/core/bitstreams/a89f526c-234c-473a-ba98-0ee2aff815c7/content
Ufeli, C., Sattar, M., Hasan, R., & Mahmood, S. (2025). Enhancing Customer Segmentation Through Factor Analysis of Mixed Data (FAMD)-Based Approach Using K-Means and Hierarchical Clustering Algorithms. information, 16(6), 441. https://doi.org/https://doi.org/10.3390/info16060441
Uvidia Fassler, M., Cisneros Barahona, A., Méndez Naranjo, P., & Villa Yánez, H. (2018). Minería de datos para la toma de decisiones en la unidad de nivelación y admisión universitaria ecuatoriana. Revista Cumbres, 4(2), 55-67. Obtenido de https://revistas.utmachala.edu.ec/revistas/index.php/Cumbres/article/view/286/127
Wani, A. A. (2024). Comprehensive analysis of clustering algorithms: exploring limitations and innovative solutions. PeerJ Computer Science, 10, e2286. https://doi.org/https://doi.org/10.7717/peerj-cs.2286
Copyright (c) 2026 Carlos Bladimir Moreano Guerra,Tania Eslavenska Escobar Erazo,Luis Fernando Herrera Moreno,Julio Andrés Escobar Cardenas

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
