Mostrar el registro sencillo del ítem

dc.contributor.authorBolivar, Armando
dc.date.accessioned2024-08-01T18:32:48Z
dc.date.available2024-08-01T18:32:48Z
dc.date.issued2024-07-04es_MX
dc.identifier.urihttps://cathi.uacj.mx/20.500.11961/28637
dc.description.abstractAn innovative strategy for organizations to obtain value from their large datasets, allowing them to guide future strategic actions and improve their initiatives, is the use of machine learning algorithms. This has led to a growing and rapid application of various machine learning algorithms with a predominant focus on building and improving the performance of these models. However, this data-centric approach ignores the fact that data quality is crucial for building robust and accurate models. Several dataset issues, such as class imbalance, high dimensionality, and class overlapping, affect data quality, introducing bias to machine learning models. Therefore, adopting a data-centric approach is essential to constructing better datasets and producing effective models. Besides data issues, Big Data imposes new challenges, such as the scalability of algorithms. This paper proposes a scalable hybrid approach to jointly addressing class imbalance, high dimensionality, and class overlapping in Big Data domains. The proposal is based on well-known data-level solutions whose main operation is calculating the nearest neighbor using the Euclidean distance as a similarity metric. However, these strategies may lose their effectiveness on datasets with high dimensionality. Hence, the data quality is achieved by combining a data transformation approach using fractional norms and SMOTE to obtain a balanced and reduced dataset. Experiments carried out on nine two-class imbalanced and high-dimensional large datasets showed that our scalable methodology implemented in Spark outperforms the traditional approach.es_MX
dc.description.urihttps://www.mdpi.com/2076-3417/14/13/5845es_MX
dc.language.isoen_USes_MX
dc.relation.ispartofProducto de investigación IITes_MX
dc.relation.ispartofInstituto de Ingeniería y Tecnologíaes_MX
dc.rightsAtribución 2.5 México*
dc.rights.urihttp://creativecommons.org/licenses/by/2.5/mx/*
dc.subject.otherinfo:eu-repo/classification/cti/7es_MX
dc.titleData-Centric Solutions for Addressing Big Data Veracity with Class Imbalance, High Dimensionality, and Class Overlappinges_MX
dc.typeArtículoes_MX
dcterms.thumbnailhttp://ri.uacj.mx/vufind/thumbnails/rupiiit.pnges_MX
dcrupi.institutoInstituto de Ingeniería y Tecnologíaes_MX
dcrupi.cosechableSies_MX
dcrupi.norevista13es_MX
dcrupi.volumen14es_MX
dcrupi.nopagina1-15es_MX
dc.identifier.doihttps://doi.org/10.3390/app14135845es_MX
dc.contributor.coauthorGarcía, Vicente
dc.contributor.coauthorFlorencia, Rogelio
dc.journal.titleApplied Scienceses_MX
dc.contributor.coauthorexternoAlejo, Roberto
dc.contributor.coauthorexternoSánchez, J. Salvador
dcrupi.colaboracionextEspañaes_MX
dc.contributor.alumnoprincipal198665es_MX
dcrupi.pronacesNingunoes_MX


Archivos en el ítem

Thumbnail
Thumbnail

Este ítem aparece en la(s) siguiente(s) colección(ones)

Mostrar el registro sencillo del ítem

Atribución 2.5 México
Excepto si se señala otra cosa, la licencia del ítem se describe como Atribución 2.5 México

Av. Plutarco Elías Calles #1210 • Fovissste Chamizal
Ciudad Juárez, Chihuahua, México • C.P. 32310 • Tel. (+52) 688 – 2100 al 09