Mostrar el registro sencillo del ítem

dc.contributor.authorGarcía, Vicente
dc.date.accessioned2019-11-20T00:29:47Z
dc.date.available2019-11-20T00:29:47Z
dc.date.issued2019-10-14
dc.identifier.urihttp://cathi.uacj.mx/20.500.11961/8554
dc.description.abstractData plays a key role in the design of expert and intelligent systems and therefore, data preprocessing appears to be a critical step to produce high-quality data and build accurate machine learning models. Over the past decades, increasing attention has been paid towards the issue of class imbalance and this is now a research hotspot in a variety of fields. Although the resampling methods, either by under-sampling the majority class or by over-sampling the minority class, stand among the most powerful techniques to face this problem, their strengths and weaknesses have typically been discussed based only on the class imbalance ratio. However, several questions remain open and need further exploration. For instance, the subtle differences in performance between the over- and under-sampling algorithms are still under-comprehended, and we hypothesize that they could be better explained by analyzing the inner structure of the data sets. Consequently, this paper attempts to investigate and illustrate the effects of the resampling methods on the inner structure of a data set by exploiting local neighborhood information, identifying the sample types in both classes and analyzing their distribution in each resampled set. Experimental results indicate that the resampling methods that produce the highest proportion of safe samples and the lowest proportion of unsafe samples correspond to those with the highest overall performance. The significance of this paper lies in the fact that our findings may contribute to gain a better understanding of how these techniques perform on class-imbalanced data and why over-sampling has been reported to be usually more efficient than under-sampling. The outcomes in this study may have impact on both research and practice in the design of expert and intelligent systems since a priori knowledge about the internal structure of the imbalanced data sets could be incorporated to the learning algorithms.es_MX
dc.description.urihttps://www.sciencedirect.com/science/article/pii/S0957417419307432es_MX
dc.language.isoen_USes_MX
dc.relation.ispartofProducto de investigación IITes_MX
dc.relation.ispartofInstituto de Ingeniería y Tecnologíaes_MX
dc.subjectClass imbalancees_MX
dc.subjectSample typeses_MX
dc.subjectLocal neighborhoodes_MX
dc.subject.otherinfo:eu-repo/classification/cti/1es_MX
dc.titleUnderstanding the apparent superiority of over-sampling through an analysis of local information for class-imbalanced dataes_MX
dc.typeArtículoes_MX
dcterms.thumbnailhttp://ri.uacj.mx/vufind/thumbnails/rupiiit.pnges_MX
dcrupi.institutoInstituto de Ingeniería y Tecnologíaes_MX
dcrupi.cosechableSies_MX
dcrupi.norevista0es_MX
dcrupi.volumen0es_MX
dcrupi.nopagina1-19es_MX
dc.contributor.coauthorSánchez Garreta, Josep Salvador
dc.contributor.coauthorMarqués, Ana
dc.contributor.coauthorFlorencia, Rogelio
dc.contributor.coauthorRivera Zarate, Gilberto
dc.journal.titleExpert Systems with Applicationses_MX
dc.lgacSin línea de generaciónes_MX
dc.cuerpoacademicoProcesamiento de Señaleses_MX


Archivos en el ítem

Thumbnail

Este ítem aparece en la(s) siguiente(s) colección(ones)

Mostrar el registro sencillo del ítem


Av. Plutarco Elías Calles #1210 • Fovissste Chamizal
Ciudad Juárez, Chihuahua, México • C.P. 32310 • Tel. (+52) 688 – 2100 al 09