Comparative evaluation of hierarchical clustering and k-means for strategic customer segmentation in commercial companies

  • Carlos Bladimir Moreano Guerra Universidad Central del Ecuador, Quito, Ecuador
  • Tania Eslavenska Escobar Erazo Universidad Central del Ecuador, Quito, Ecuador
  • Luis Fernando Herrera Moreno Universidad Politécnica Salesiana, Quito, Ecuador
  • Julio Andrés Escobar Cardenas Universidad de las Fuerzas Armadas, Quito, Ecuador
Keywords: Customer Segmentation, K-means, Hierarchical Clustering, Customer Behavior, Cluster Analysis

Abstract

In today's business environment, customer segmentation is a key resource for optimizing both decision-making and the design of business strategies based on data analysis. In this context, clustering techniques have gained relevance because they allow for the identification of behavioral patterns and the grouping of customers based on their consumption habits. This study aims to comparatively evaluate the performance of hierarchical clustering and K-means for the strategic segmentation of customers in companies in the retail sector, considering purchase frequency, transaction value, and customer tenure as variables. The research employed a quantitative approach based on data analytics, beginning with the preprocessing and normalization of the variables. Subsequently, the optimal number of clusters was determined using the elbow method. Hierarchical clustering was used for the exploratory diagnosis of the data structure, followed by the K-means algorithm for the final classification of homogeneous groups. Additionally, the Silhouette index was used as an internal validation metric to examine the quality of the resulting clusters. The results revealed the existence of customer segments with distinct behavioral patterns, distinguishing between frequent and occasional consumers. In conclusion, the combined implementation of these techniques enables the creation of reliable and consistent segmentations that support strategic decision-making and the development of personalized marketing tactics.

Downloads

Download data is not yet available.

Author Biographies

Carlos Bladimir Moreano Guerra, Universidad Central del Ecuador, Quito, Ecuador
Tania Eslavenska Escobar Erazo, Universidad Central del Ecuador, Quito, Ecuador
Luis Fernando Herrera Moreno, Universidad Politécnica Salesiana, Quito, Ecuador
Julio Andrés Escobar Cardenas, Universidad de las Fuerzas Armadas, Quito, Ecuador

References

Abdel-Hakim, A. E., Ibrahim, A.-M. M., Bouazza, K. E., Deabes, W., & Hedar, A.-R. (2024). Ellipsoidal K-Means: An Automatic Clustering Approach for Non-Uniform Data Distributions. Algorithms, 17(12), 551. https://doi.org/https://doi.org/10.3390/a17120551

Afzal, A., Khan, L., Hussain, M. Z., Hasan, M. Z., Mustafa, M., Khalid, A., . . . Javaid, A. (2024). Customer Segmentation Using Hierarchical Clustering. 2024 IEEE 9th International Conference for Convergence in Technology (I2CT) (págs. 1-6). Pune, India: IEEE. https://doi.org/https://doi.org/10.1109/I2CT61223.2024.10543349

Alonso, J., Largo, M., & Hoyos, C. (2024). Una introducción a los modelos de Clustering empleando R. Universidad Icesi. https://doi.org/https://doi.org/10.18046/EUI/bda.h.6

Alves Gomes, M., & Meisen, T. (2023). A review on customer segmentation methods for personalized customer targeting in e-commerce use cases. Information Systems and e-Business Management, 21, 527-570. https://doi.org/https://doi.org/10.1007/s10257-023-00640-4%0A

Ayala-Aldana, N., & Parra-Cid, C. (2023). Algoritmos de agrupamiento jerárquico para el control de la susceptibilidad antibiótica. Revista Chilena de Enfermedades Respiratorias, 39(1), 120-121. Obtenido de https://revchilenfermrespir.cl/index.php/RChER/article/view/1124

Bermúdez León, M. (2020). Normalización. Universidad San Marcos. Obtenido de https://repositorio.usam.ac.cr/xmlui/bitstream/handle/11506/2125/LEC%20ING%20SIST%200069%202020.pdf?sequence=1&isAllowed=y

Camargo Morales, F., Angarita López, J., & Najar Sánchez, O. (2023). Sistematización de conceptos de marketing con realidad aumentada. Ediciones de la U. Obtenido de https://books.google.com.ec/books?id=69jDEAAAQBAJ&pg=PA88&hl=es&source=gbs_selected_pages&cad=1#v=onepage&q&f=true

Chicco, D., Campagner, A., Spagnolo, A., Ciucci, D., & Giuseppe, J. (2025). The Silhouette coefficient and the Davies-Bouldin index are more informative than Dunn index, Calinski-Harabasz index, Shannon entropy, and Gap statistic for unsupervised clustering internal evaluation of two convex clusters. PeerJ Computer Science, 11, e3309. https://doi.org/I 10.7717/peerj-cs.3309

Font, X. (2019). Técnicas de clustering. FUOC. Obtenido de https://openaccess.uoc.edu/server/api/core/bitstreams/859ca353-d4f7-4448-a284-6454decfc950/content

Franco-Árcega, A., Sobrevilla-Sólis, V., Gutiérrez-Sánchez, M., García-Islas, L., Suárez-Navarrete, A., & Rueda-Soriano, E. (2021). Sistema de enseñanza para la técnica de agrupamiento k-means. Padi Boletín Científico de Ciencias Básicas e Ingenierías del ICBI, 9(Especial), 53-58. https://doi.org/https://doi.org/10.29057/icbi.v9iEspecial.7384

Guañuna Viteri, N. (2022). Clasificación de los productos de una empresa de Quito considerando el recurso tiempo de mano de obra asignado a cada uno de sus procesos durante el año 2021-2022 utilizando algoritmos de aprendizaje no supervisado. Obtenido de [Tesis de maestría, Pontificia Universidad Católica del Ecuador]: https://repositorio.puce.edu.ec/server/api/core/bitstreams/268b664d-220d-468b-a8d3-a68d5469976f/content

Han, W., Zhang, S., Gao, H., & Bu, D. (2024). Clustering on hierarchical heterogeneous data with prior pairwise relationships. BMC Bioinformatics, 25(40), 1-22. https://doi.org/https://doi.org/10.1186/s12859-024-05652-6

Kopřivová, V., & Matušínská, K. (2023). Unlocking Generation Y: Market Segmentation via Lifestyle Insights. Communication Today, 14(2), 122-139. https://doi.org/https://doi.org/10.34135/communicationtoday.2023.Vol.14.No.2.9

Marrero, L., Carrizo, D., García-Santander, L., & Ulloa-Vasquez, F. (2021). Uso de algoritmo K-means para clasificar perfiles de clientes con datos de medidores inteligentes de consumo eléctrico: Un caso de estudio. Ingeniare. Revista chilena de ingeniería, 29(4), 778-787. https://doi.org/http://dx.doi.org/10.4067/S0718-33052021000400778

Moharana, U., & Sarmah, S. (2018). Joint replenishment of associated spare parts using clustering approach. The International Journal of Advanced Manufacturing Technology, 94(1), 2535-2549. https://doi.org/https://doi.org/10.1007/s00170-017-0909-6

Mussabayev, R. (2024). Optimizing Euclidean Distance Computation. Mathematics, 12(23), 3787. https://doi.org/https://doi.org/10.3390/math12233787

Naoui, M. A., Brahim, L., & Mouloud, A. (2020). Usando el algoritmo K-means para la curva de regresión en un gran sistema de datos para el entorno empresarial. Revista Cubana de Ciencias Informáticas, 14(2), 34-48. Obtenido de https://rcci.uci.cu/index.php/RCCI/article/view/1909

Navarrete Vinces, M., Alegría Camino, D., Galarza Luna, A., & Ramírez Garofalo, D. (2025). La Inteligencia Artificial y el aprendizaje automático en la Educación Superior del Ecuador. Tesla Revista Científica, 5(2), e515. https://doi.org/https://doi.org/10.55204/trc.v5i2.e515

Oti, E. U., & Olusola, M. O. (2024). Overview of Agglomerative Hierarchical Clustering Methods. British Journal of Computer, Networking and Information Technology, 7(2), 14-23. https://doi.org/https://www.doi.org/10.52589/BJCNIT-CV9POOGW

Plazas Niño, F. (2021). Introducción al análisis clúster: Una aplicación en la clasificación de campos petroleros. Obtenido de https://www.researchgate.net/publication/350936275_INTRODUCCION_AL_ANALISIS_CLUSTER_UNA_APLICACION_EN_LA_CLASIFICACION_DE_CAMPOS_PETROLEROS

Prieto Herrera, J. E. (2021). Investigación de mercados (Tercera ed.). ECOE. Obtenido de https://books.google.com.ec/books?id=tQpZEAAAQBAJ&printsec=frontcover&source=gbs_atb&redir_esc=y#v=onepage&q&f=false

Ramírez-Valverde, G., & Ramírez-Valverde, B. (2022). Programa estadístico R, Herramienta clave en el análisis y visualización de datos. Agro-Divulgación, 2(2), 17-22. Obtenido de https://agrodivulgacion-colpos.org/index.php/1agrodivulgacion1/article/view/59

Rousseeuw, P. J. (1987). Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. Journal of Computational and Applied Mathematics, 20, 53-65. https://doi.org/https://doi.org/10.1016/0377-0427(87)90125-7

Saxena, A., Agarwal, A., Pandey, K. B., & Pandey, D. (2024). Examination of the Criticality of Customer Segmentation Using Unsupervised Learning Methods. Circular Economy and Sustainability, 4, 1447-1460. https://doi.org/https://doi.org/10.1007/s43615-023-00336-4

Shi, C., Wei, B., & Wei, S. (2021). A quantitative discriminant method of elbow point for the optimal number of cluster in clustering algorithm. EURASIP Journal on Wireless Communications and Networking, 2021(31). https://doi.org/https://doi.org/10.1186/s13638-021-01910-w

Singh, M., Singh, A., Gupta, M., & Reddy, R. (2022). Leveraging K-Means Clustering and Hierarchical Agglomerative Algorithms for Scalable AI-Driven Customer Segmentation. Journal of AI ML Research, 11(7), 1-23. Obtenido de https://www.joaimlr.com/index.php/v1/article/view/31

Stack Sánchez, P. (2021). Métodos de aprendizaje supervisado y no supervisado para la estimación de microestructura cerebral en datos de DWMR. Obtenido de [Tesis de maestría, Centro de Investigación en Matemáticas, A.C.]: https://cimat.repositorioinstitucional.mx/jspui/bitstream/1008/1129/1/TE%20835.pdf

Tabianan, K., Velu, S., & Ravi, V. (2022). K-Means Clustering Approach for Intelligent Customer Segmentation Using Customer Purchase Behavior Data. Sustainability, 14(12), 7243. https://doi.org/https://doi.org/10.3390/su14127243

Tan, P.-N., Steinbach, M., Karpatne, A., & Kumar, V. (2019). Introduction to Data Mining. Pearson Education. Obtenido de https://api.pageplace.de/preview/DT0400.9780273775324_A37747616/preview-9780273775324_A37747616.pdf

Trujillo Logroño, K. (2025). Análisis del proceso de clustering con K-means para la segmentación de clientes en la empresa de Rastreo 365. Obtenido de [Tesis de maestría, Universidad Tecnológica Indoamérica]: https://repositorio.uti.edu.ec/server/api/core/bitstreams/a89f526c-234c-473a-ba98-0ee2aff815c7/content

Ufeli, C., Sattar, M., Hasan, R., & Mahmood, S. (2025). Enhancing Customer Segmentation Through Factor Analysis of Mixed Data (FAMD)-Based Approach Using K-Means and Hierarchical Clustering Algorithms. information, 16(6), 441. https://doi.org/https://doi.org/10.3390/info16060441

Uvidia Fassler, M., Cisneros Barahona, A., Méndez Naranjo, P., & Villa Yánez, H. (2018). Minería de datos para la toma de decisiones en la unidad de nivelación y admisión universitaria ecuatoriana. Revista Cumbres, 4(2), 55-67. Obtenido de https://revistas.utmachala.edu.ec/revistas/index.php/Cumbres/article/view/286/127

Wani, A. A. (2024). Comprehensive analysis of clustering algorithms: exploring limitations and innovative solutions. PeerJ Computer Science, 10, e2286. https://doi.org/https://doi.org/10.7717/peerj-cs.2286

Published
2026-04-12
How to Cite
Moreano Guerra, C. B., Escobar Erazo, T. E., Herrera Moreno, L. F., & Escobar Cardenas, J. A. (2026). Comparative evaluation of hierarchical clustering and k-means for strategic customer segmentation in commercial companies. GADE: Scientific Journal, 6(1), 674-703. https://doi.org/10.63549/rg.v6i1.806