Analyzing Worker Productivity In Garment Industry With SQL
The decision-makers in the garment industry would be in a better position in the industry if they track, analyze, and predict the performance of the employees in their factories as it would affect the ability of their business to fulfill the global demand for garment products.👕
In this article, I will explain how I performed an analysis on the dataset that contains the attributes of the garment manufacturing process and the productivity of the employees. This dataset was obtained from UCI Machine Learning Repository. For your information, I performed the analysis using MySQL workbench. I have created a PDF showing the steps to create a new schema and import datasets to the MySQL workbench. You may refer to this PDF if you are not sure how to perform these steps.🧐
Before importing the CSV file in MySQL workbench, I removed the trailing and leading space from the values in the “department” column using Excel’s TRIM()
function. I also renamed “sweing” in this column to “sewing”.
Below are the hypothetical questions I created on my own, my solutions, and the results. 🤓
Query 1: Determine the number of achieved targets and failed targets for each team. Provide a ranking number for the average productivity for each team, where the value 1 is given to the highest average productivity. Also, compute the trimmed mean of incentives for each team. Then, sort the result based on the trimmed mean of incentives in descending order.
For convenience of sake, I created a view called “target_status” which can show us if the particular team has achieved the target. A view is a mechanism for querying data and it does not involve data storage. We can use this view to access data just as though we were querying tables directly.
The above query involved localized sorting, which includes a call to the ranking functions that used the average productivity to generate the rankings, with the values sorted in descending order, whereby the team with the highest average productivity will be ranked 1.
The ORDER BY
clause at the end of the query determines how the result set should be sorted and the other ORDER BY
clause within the ranking function determines how the rankings should be allocated.
The 9th column uses the ROW_NUMBER()
function to assign a unique ranking to each row, without regard to ties. Each of the 12 rows is assigned a number from 1 to 12, with the ranking value arbitrarily assigned for teams who have the same average productivity.
The next two columns assign the same ranking in the case of a tie. However, the RANK()
function returns the same ranking in the case of a tie, with gaps in the rankings. The DENSE_RANK()
function returns the same ranking in case of a tie, with no gaps in the rankings. Looking at row 7 of the result set, you can see that the RANK()
function skips the value 5 and assigns the value 6, whereas the DENSE_RANK()
function assigns the value 5.
Any of these functions would be good if there are no ties in your result set. The RANK()
function may be the BEST option for most situations. Above are the explanations of the SQL code for query 1. 😎
In this case, team 9 received the highest incentives, but it is not the most productive, and not even in the top 5 most productive teams based on the 10th column of the result set. The most productive teams are teams 1, 3, and 12. We could see there is a positive relationship between incentives and productivity after we remove outliers (the highest and lowest incentives) from each team. The worker productivity increases when the incentive increases.
A trimmed mean is a robust statistic because it is less sensitive to bias and it can reduce the effect of skew. With this approach, we do not need to make assumptions that are difficult to test with the relatively limited range of statistical tools available in SQL.
Teams 6, 7, and 8 received the least amount of incentive, and they have the lowest actual productivity on average. The amount of extra incentive stops mattering for actual productivity after reaching a certain threshold because team 11 received a high incentive of 27.63BDT on average but its average productivity was ranked 10th based on the 10th column of the result set.
Query 2: Discover the relationship between productivity, the total number of workers, the number of idle workers, the work in progress (WIP), and the standard minute value (SMV) for each week.
I used the STR_TO_DATE()
function to convert “date” from string to date format before using the WEEK()
function to obtain the week number. I used DATEDIFF()
function to find the difference between the starting date of the week and the ending date of the week.
The result showed that the SMV and the number of workers are positively correlated. More workers need to be allocated to perform longer tasks. However, the average SMV in week 5 was lower than that in week 7, but the average number of workers in week 5 was much higher, and its average WIP also reached its highest point.
WIPs are unfinished goods that are not ready to be sold, but they have been subjected to processing. They are part of the total production costs because we need to provide storage, manpower, and various utilities to maintain them and even keep them secure before any further processing. It is worse if we keep them on the assembly line because it will result in slow production rates. It would be a big issue when if the company sales show a downward trend or remain relatively stable/constant, as the funds would be tied up in WIP and they could not be used for investment or other business purposes unless those WIPs have been completed and sold.
An increase in the WIP would only be a good thing if the company sales show an upward trend because it indicates that there is an increase in the demand for the company’s products, and the production can manufacture faster than it can sell. Hence, most production managements aim to keep their WIP at optimal levels to keep the associated costs low.
Since we are unsure if the company has high, low, or stable sales; hence, we could not justify if a high level of WIP is a good thing or a bad thing for this company. In general, to reduce the WIPs, the management would need to choose operators who have the best skill set for each operation so that workflow is executed smoothly and effectively. The management can transfer those underperforming workers to another position or perhaps terminate them if there is no other position suitable for them. Besides, they can also prepare visual process controls so that the plant personnel can easily identify variations and eliminate the source from the manufacturing process.
In my opinion, more datasets (e.g. sales received, lead time to fulfill the orders) are needed to perform a deeper analysis for weeks 1, 3, 4, and 5, as their average WIP and the average number of workers were higher than that of the weeks that have a higher SMV. The same goes for week 7 because the average number of idle workers is as high as 2.
Query 3: Compare the productivity from one week to another by showing the percentage difference from the prior week.
The LAG()
function is used to retrieve a column value from a prior row in the result set, whereas the LEAD()
function is used to retrieve a column value from the following row. Both functions allow for an optional second parameter (which defaults to 1) to describe the number of rows prior/following from which to retrieve the column value.
Looking at the results, the weekly average productivity of 0.71 for week 3 also appears in the “next_wk_tot” column for week 2, as well as in the “prev_wk_tot” column for week 4. Since there is no row before week 0 in the result set, the value generated by the lag function is null for the first row; likewise, the value generated by the lead function is null for the last row in the result set.
This marks the end of my SQL project on the garment manufacturing dataset. Hope this article can shed some light on how to use SQL to perform data analysis on real-world datasets. Happy learning.😁