FIRM-LEVEL%20PREDICTORS%20OF%20LABOUR%20TAX%20EVASION.%20Alice%20Mikk.pdf - Page 3

The results of the wage regression suggest that women experience a wage penalty, which is in line with the gender pay gap estimations for Estonia. Meriküll & Tverdostup (2023) discuss that there has been certain intertia to the wage gap after Estonia transitioned from communism to capitalism and estimated the gap to have declined from 34% in 1989 to around 19% in 2019. SA estimated the wage gap to be 14.9% in 2021 and 17.7% in 2022 with highest gap in financial and insurance activities (Statistics Estonia, table PA5355). According to the regression, being a woman compared to being a man results in 22% lower wage, holding other factors fixed. The results also suggest the variables age and experience have a non-linear relationship with wage.

Additionally, basic education, secondary and tertiary education impact wage positively compared to having pre-school education. The effect of basic education compared to pre-school education is statistically significant for 2021 but not for 2022 . This could be due to fact that basic education serves as the mandatory minimum of general educatio n requirement in Estonia and individuals who have only the pre-school education are not diff erentiated by the employers. Another explanation could be that the base group is relativ ely small and therefore the effect of additional years in basic education is underestimated. The num ber of observations falling into binary and categorical variable groups is presented in Appendi x 1. Interestingly, professionals seem to have higher wage expectancy compared to managers. The mo del suggests that professionals earn 10- 12% higher salary compared to managers. The wage ex pectancy of other occupation groups compared to managers is negative, with elementary o ccupations earning 39-41% less than managers, holding other variables constant.

As far as NACE dummy is concerned, working in the manufacturing sector results ceteris paribus in higher wages than others, the lowest being const ruction. Regional dummies also present the expected results, with the highest wages for Northe rn Estonia and lowest for North-Eastern Estonia. Holding other factors fixed, working in No rth-Eastern Estonia compared to working in Northern Estonia results in 23% lower wage. This is in line with SA estimations as average gross salary for Harju county in 2021 was from 1593 to 1730 euros (2022: 1751-1946 euros), compared to 1102-1184 euros (2022: 1176-1334 euros) in Ida-Viru county (Statistics Estonia, table PA117).

Next, the distribution of residuals is inspected. T he bottom 10% of the residual distribution is considered suspiciously low-paid meaning that the w age regression, taking into account all individual characteristics, estimated the wage of the individual to be significantly higher than the actual wage reported to the tax authorities. This c ould indicate that the employee could be 37

receiving a part of its salary in an envelope to evade labour taxes. Table 9 presents the total number of firms (employers), the firms for which one emplo yee was present in the bottom 10% of the residual distribution. If looking at 2021 or 2022 individually, the wage regression suggests for both years that nearly 50% of the companies have at leas t one employee who receives a salary drastically lower than the wage regression would es timate. However, if looking at companies for which 50% or more of the employees fell into the bo ttom 10% of the residual distribution, the result is slightly above 30% for both, 2021 and 2022. What is more, 3552 companies have 50% or more of employees in bottom 10% of the residual distribution for two consecutive years. Table 9. Share of tax evading firms Year Total firms One employee in bottom 10% of residual distribution
Bottom 10% firms % ≥50% of employees in bottom 10% of the residual distribution
Tax evading firms % 2021 22 672 11 207 49.4% 5997 26.45% 2022 21 46 3 10 488 48.9% 5666 26.40 % Source: Author ’s calculations After investigating the individuals and firms in th e bottom 10% of the residual distribution, the pattern shows that 9.8% of the identified evasion i s due to individuals being paid the minimum wage, whereas the wage regression suggests a much higher salary. Tonin (2011) suggests that tax evasion among employees is concentrated at the lower end of productivity distribution and lower wages. More than a third earn less than 110% of the minimum wage and the administrative average salaries falling into the distribution do not excee d 2000 euros. What is more, 75.2% of the individuals falling into the bottom are male. Out o f all individuals working in the construction sector, 14.3% fall into the bottom, followed by 12.4% in transportation and storage. According to the survey by EKI (Josing, 2016), the characteritics of an individual receiving envelope wages are young, male, lower educational attainment, lower salary, living outside of the city and working in smaller firms active in construction, service or agricultural sectors. 3.1.2. Logistic regression After combining the two subsets of firms for which the binary classification assumption of tax compliant and tax evading was done and merging it with firm’s financial data, logistic regression is used to model the relationship between binary outcome and predictor variables. Additonally, the out-of-sample probability of tax evasion is predicted for companies with unknown classification. 38

After the data availability for all necessary varia bles was assessed for 2021, a subset of 23 791 (2022: 24 314) firms remain in the dataset, of whic h 5195 (2022: 4860) firms are available for training and testing purposes. From the latter, 529 (2022: 520) firms are presumed to be tax compliant and 4666 (2022: 4340) are presumed to be tax evading firms. The subset of firms is randomly split into training and test set to model the outcome of labour tax evasion, training subset accounting to 80% of the observations and test subset accounting to 20% of the observations. Due to fact that one class is significantly more preval ent in the training set, weighted loss function is applied. The logistic regression model coefficients represent the log odds of the binary outcome changing by one unit for each unit increase in the predictor variable, ceteris paribus. The logit model only makes it possible to determine the direction of the effect of the independent variables on the dependent variable, so marginal effects are also calculated to better understand the impact. The marginal effects of the regression are presented in table 10. Table 10. Marginal effects
Tax evading Tax evading 2021 2022 Size -0.015*** (0.001 ) -0.017*** (0.001 ) Construction 0.114*** (0.01 6) 0.196*** (0.01 7) Wholesale and retail trade 0.043* (0.015) 0.057*** (0.016) Transportation and storage 0.166*** (0.017) 0.183*** (0.018) Turnover -0.097*** (0.004 ) -0.077*** (0.00 4) Debt to assets -0.035*** (0.006) -0.023* (0.007) Short-term debt to assets -0.001 (0.001) 0.011** (0.004) Cash to assets -0.067 (0.022) -0.041 (0.021) Turnover to assets 0.016*** (0.002) 0.005** (0.001) COGS to turnover 0.012* (0.006) -0.028*** (0.005) Observations 4 156 3 888 Note: Results are based on Eq. 2. Significance level * p < 0.05, ** p < 0.01, *** p < 0.001. Stardard errors in parentheses.
The results of the logistic regression suggest that size of the company is negatively associated with the probability to evade taxes, which is in line with previous papers by Beneish (1999) and Putni ņš 39

& Sauka (2015). For 2021, an one-unit increase in the size of the company (one additional person employed) is associated with the probability of tax evasion decreasing by 0.015 (2022: 0.017) on average, holding other variables constant. What is more, compared to manufacturing, operating in construction, wholesale and retail trade or transpo rtation and storage results in higher probability to be engaged in tax evasion. In 2021, transportati on and storage present higher probability to be engaged in labour tax evasion than construction, but in 2022, construction is showing the highest impact.

The results of the logistic regression also suggest that higher turnover indicates a decrease in the predicted probability of tax evasion. This is in li ne with previous studies by Putni ņš & Sauka (2015), Abdixhiku et al . (2017) and Benkovskis & Fadejeva (2022). If the turnover exceeds certain thresholds, a firm is required to meet additional r eporting obligations as well as officiate audits. Therefore, the higher the turnover, the higher is t he propensity to comply to rules. Benkovskis & Fadejeva (2022) also find that higher debt to asset s ratio and short-term debt to assets ratio is associated with more probable tax evasion, however, the results on the Estonian dataset suggest otherwise. Beneish (1999) also reports that evading firms tend to be more leveraged. Hajek & Henriques (2017) suggest that higher leverage incentivises the firm to boost financial performance. The opposing result on Estonian data could be due t o inclusion of a large number of micro enterprises in the analysis as smaller companies ar e less likely to have substantial loan liabilities on their balance sheet. The same applies to short-term liabilities.

Increase in cash to assets ratio results in decreas e in the predicted probability of tax evasion, however the variable is not statistically significant at the 5% level for 2021 and 2022. Benkovskis & Fadejeva (2022) also report a negative relationsh ip, suggesting that firms with relatively high cash holdings are less likely to be involved in labour tax evasion. Regarding turnover to assets, the coefficient is positive and statistically significa nt for both years (on 0.1% level for 2021 and on 1% level for 2022). The higher probability of labou r tax evasion for firms with a higher turnover to assets ratio could be due to overreporting of re venue in tax evading firms, a line of reasoning supported by Benkovskis & Fadejeva (2022). The coef ficients for COGS to turnover suggest positive effect on the predicted probability of lab our tax evasion for 2021 and negative for 2022, however not statistically significant for 2021. Ben kovskis & Fadejeva (2022) also report only marginal statistical significance for intermediate inputs to turnover.

The logistic regression model is tested on test set and the goodness of the model is evaluated. The probability threshold is set to 65%. Table 11 presents the confusion matrix to evaluate the performance of a classification model on a test sample using a probability threshold 65% for 2021 and 2022. For 2021, the model correctly predicted 568 instances (2022: 510) as belonging to class 1 (true positives) and incorrectly predicted 16 instances (2022: 13) as belonging to class 1 (false positives), when they actually belong to class 0. The model correctly predicted 92 instances (2022: 94) as belonging to class 0 (true negatives) and incorrectly predicted 363 instances (2022: 355) as belonging to class 0 when they actually belong to class 1 (false negatives). Table 11. Confusion matrix

2021 2022

False True False True 0 92 16 94 13 1 363 568 355 510 Source: Author’s calculations

To measure the prediction performance of the logistic regression model on test data, accuracy, TP rate, FP rate, TN rate, FN rate, F-score and AUC are presented in Table 12. Table 12. Prediction performance ratios

Accuracy TP rate FP rate TN rate FN rate F-score AUC 2021 63.52 61.01 14.81 85.19 38.99 0.75 0.84 2022 62.14 58.96 12.15 87.85 41.04 0.73 0.88 Source: Author’s calculations

The prediction performance of the logistic regression model suggests that for 2021, 61% (2022: 58%) of the companies are correctly predicted as fraudulent companies and 85% (2022: 88%) of companies are correctly predicted as non-fraudulent. This is in line with previous studies, suggesting slightly lower TP rate for logit and probit compared to other machine learning approaches. Cecchini et al. (2010) compared the prediction performance of different approaches on their dataset and found that logistic regression following paper by Dechow et al. (2009) correctly predicted 64.5% of the fraudulent companies and 66.4% of non-fraudulent companies while support vector machines using the financial kernel correctly predicted 80.0% of the fraudulent and 90.6% of the non-fraudulent companies, being therefore better at predicting than logistic regression model.

Page 3 of 5