Abstract:
Purpose: Develop a model for identifying the most relevant quality dimensions for predicting defective products in an industrial setting, using high-dimensional variables and applying Lasso logistic regression from the binomial family. This involves decomposing an input vector , into a linear combination of a reduced number of basic elements from a matrix , where In this context, represents the number of manufactured products and p represents the quality dimensions of each product. The aim is to approximate , where is a sparse vector containing k non-zero coefficients, with . Design/methodology/approach: Lasso logistic regression from the binomial family was used, complemented with other tools to validate the model. Findings: The model present 408 significant variables out of a total of 1,555 features. These were categorized into five zones according to their impact: critical (4.17%), important (8.09%), moderate (41.42%), minimal (46.08%), and irrelevant (0.25%). The model achieved a binomial deviance of 0.61, demonstrating its effectiveness in identifying and prioritizing critical quality characteristics in complex industrial processes. Originality/value: This methodology provides a practical tool for monitoring and quality control in industrial environments where high-dimensional binary data is generated.