Update README.md
Browse filesFix math syntax and edit some sections.
README.md
CHANGED
@@ -16,7 +16,7 @@ pinned: false
|
|
16 |
|
17 |
## Metric Description
|
18 |
|
19 |
-
|
20 |
|
21 |
|Digits|Group level |
|
22 |
|--|--|
|
@@ -25,9 +25,7 @@ The hierarchical structure of the ISCO-08 classification scheme, as depicted in
|
|
25 |
| 3-digits | Minor groups |
|
26 |
| 4-digits | Unit groups |
|
27 |
|
28 |
-
|
29 |
-
|
30 |
-

|
31 |
|
32 |
In this context, the hierarchical accuracy measure is specifically designed to evaluate classifications within this structured framework. It emphasizes the importance of precision in classifying occupations at the correct level of specificity:
|
33 |
|
@@ -50,48 +48,62 @@ The measure applies a higher penalty for errors that occur between more distant
|
|
50 |
|
51 |
Misclassification among sibling categories (e.g., between different Minor groups within the same Sub-major group) is less severe than misclassification at a higher hierarchical level (e.g., between different Major groups).
|
52 |
|
53 |
-
The measure extends the concepts of precision and recall into a hierarchical context, introducing hierarchical precision (
|
54 |
|
55 |
-
To calculate the hierarchical measure,
|
56 |
|
57 |
-
|
58 |
|
59 |
-
|
60 |
|
61 |
-
|
62 |
|
63 |
-
|
64 |
|
65 |
-
|
66 |
|
67 |
-
|
68 |
|
69 |
-
|
70 |
|
71 |
-
|
72 |
-
hF_β = \frac{(β^2 + 1) · hP · hR}{(β^2 · hP + hR)}, β ∈ [0, +∞)
|
73 |
-
$$
|
74 |
|
75 |
-
|
76 |
|
77 |
-
|
78 |
|
79 |
-
|
80 |
|
81 |
-
|
82 |
|
83 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
84 |
|
85 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
86 |
|
87 |
-
- **input_field** *(type): Definition of input, with explanation if necessary. State any default value(s).*
|
88 |
- **references** *(List[str])*: List of reference ISCO-08 codes (true labels). This is the ground truth.
|
89 |
- **predictions** *(List[str])*: List of predicted ISCO-08 codes (predicted labels). This is the predicted classification or classification to compare against the ground truth.
|
90 |
|
91 |
### Output Values
|
92 |
|
93 |
-
*Explain what this metric outputs and provide an example of what the metric output looks like. Modules should return a dictionary with one or multiple key-value pairs, e.g. {"bleu" : 6.02}*
|
94 |
-
|
95 |
**Example output**:
|
96 |
|
97 |
```python
|
@@ -107,13 +119,19 @@ Values are decimal numbers between 0 and 1. Higher scores are better.
|
|
107 |
|
108 |
#### Values from Popular Papers
|
109 |
|
110 |
-
|
111 |
|
112 |
-
|
113 |
|
114 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
115 |
|
116 |
-
|
117 |
|
118 |
```python
|
119 |
def compute_metrics(p: EvalPrediction):
|
@@ -126,22 +144,16 @@ def compute_metrics(p: EvalPrediction):
|
|
126 |
return result
|
127 |
```
|
128 |
|
129 |
-
More TBA
|
130 |
-
|
131 |
## Limitations and Bias
|
132 |
|
133 |
-
|
134 |
-
|
135 |
-
TBA
|
136 |
|
137 |
## Citation
|
138 |
|
139 |
-
This metric was developed as part of an IEA R&D project, [Improving Parental Occupation Coding Procedures with AI](https://www.iea.nl/sites/default/files/2024-09/Improving-Parental-Occupation-Coding-Procedures-AI.pdf) (Duckworth et al, 2024). The metric
|
140 |
-
|
141 |
-
TBA
|
142 |
|
143 |
## Further References
|
144 |
|
145 |
-
|
146 |
-
|
147 |
-
|
|
|
16 |
|
17 |
## Metric Description
|
18 |
|
19 |
+
ISCO‑08 is a four‑level taxonomy (Table 1). Correctly locating an occupation at the appropriate level—Major, Sub‑major, Minor or Unit—is essential, yet many practical systems make small but still useful errors (e.g. confusing two Unit groups within the same Minor group). The present metric extends classical precision, recall and *F*‑measure to this hierarchical setting, thereby providing a more nuanced assessment than flat accuracy.
|
20 |
|
21 |
|Digits|Group level |
|
22 |
|--|--|
|
|
|
25 |
| 3-digits | Minor groups |
|
26 |
| 4-digits | Unit groups |
|
27 |
|
28 |
+

|
|
|
|
|
29 |
|
30 |
In this context, the hierarchical accuracy measure is specifically designed to evaluate classifications within this structured framework. It emphasizes the importance of precision in classifying occupations at the correct level of specificity:
|
31 |
|
|
|
48 |
|
49 |
Misclassification among sibling categories (e.g., between different Minor groups within the same Sub-major group) is less severe than misclassification at a higher hierarchical level (e.g., between different Major groups).
|
50 |
|
51 |
+
The measure extends the concepts of precision and recall into a hierarchical context, introducing hierarchical precision (ℎ𝑃) and hierarchical recall (ℎ𝑅). In this framework, each sample belongs not only to its designated class but also to all ancestor categories in the hierarchy, excluding the root (we exclude the root of the graph, since all samples belong to the root by default). This adjustment allows the measure to account for the hierarchical structure of the classification scheme, rewarding more accurate location of a sample within the hierarchy and penalizing errors based on their hierarchical significance.
|
52 |
|
53 |
+
To calculate the hierarchical measure, extend the set of real classes
|
54 |
|
55 |
+
$$C_i = \{G\}$$
|
56 |
|
57 |
+
with all ancestors of 𝐺:
|
58 |
|
59 |
+
$$\vec{C}_i = \{B, C, E, G\}$$
|
60 |
|
61 |
+
Similarly, extend the set of predicted classes
|
62 |
|
63 |
+
$$C^′_i = \{F\}$$
|
64 |
|
65 |
+
with all ancestors of 𝐹:
|
66 |
|
67 |
+
$$\vec{C}^′_i = \{C, F\}$$
|
68 |
|
69 |
+
Class 𝐶 is the only correctly assigned label from the extended sets:
|
|
|
|
|
70 |
|
71 |
+
$$| \vec{C}_i ∩ \vec{C}^′_i| = 1$$
|
72 |
|
73 |
+
There are
|
74 |
|
75 |
+
$$| \vec{C}^′_i| = 2$$
|
76 |
|
77 |
+
predicted labels and
|
78 |
|
79 |
+
$$| \vec{C}_i| = 4$$
|
80 |
+
|
81 |
+
real classes.
|
82 |
+
|
83 |
+
Therefore,
|
84 |
+
|
85 |
+
$$hP = \frac{| \vec{C}_i ∩ \vec{C}^′_i|} {|\vec{C}^′_i |} = \frac{1}{2}$$
|
86 |
|
87 |
+
$$hR = \frac{| \vec{C}_i ∩ \vec{C}^′_i|} {|\vec{C}_i |} = \frac{1}{2}$$
|
88 |
+
|
89 |
+
Finally, combine ℎ𝑃 and ℎ𝑅 into the hierarchical 𝐹-measure:
|
90 |
+
|
91 |
+
$$hF_β = \frac{(β^2 + 1) · hP · hR}{(β^2 · hP + hR)}, β ∈ [0, +∞)$$
|
92 |
+
|
93 |
+
The metric **rewards depth**: predicting *221* instead of *22* yields higher *hF* because more ancestors overlap. It **penalises distance**: predicting a Unit in a different Sub‑major group incurs a larger loss than confusing two Units under the same Minor group.
|
94 |
+
|
95 |
+
## How to Use
|
96 |
+
|
97 |
+
- **Model evaluation** Assess neural or rule‑based classifiers that map free‑text occupation descriptions to ISCO‑08 codes.
|
98 |
+
- **Inter‑rater agreement** Quantify consistency between human coders when full agreement at the Unit level is not always expected.
|
99 |
+
|
100 |
+
### Inputs
|
101 |
|
|
|
102 |
- **references** *(List[str])*: List of reference ISCO-08 codes (true labels). This is the ground truth.
|
103 |
- **predictions** *(List[str])*: List of predicted ISCO-08 codes (predicted labels). This is the predicted classification or classification to compare against the ground truth.
|
104 |
|
105 |
### Output Values
|
106 |
|
|
|
|
|
107 |
**Example output**:
|
108 |
|
109 |
```python
|
|
|
119 |
|
120 |
#### Values from Popular Papers
|
121 |
|
122 |
+
The following table is from the paper [Improving Parental Occupation Coding Procedures with AI](https://www.iea.nl/sites/default/files/2024-09/Improving-Parental-Occupation-Coding-Procedures-AI.pdf).
|
123 |
|
124 |
+
**Summary of multilingual model overall accuracies and hF-measure scores**
|
125 |
|
126 |
+
| Model name | Training dataset (training & validation splits) | Evaluation dataset (test split) | Accuracy | hFβ |
|
127 |
+
|------------|--------------------------------------------------|---------------------------------|----------|-----|
|
128 |
+
| Model 1 | ICILS | ICILS | 63% | 0.89|
|
129 |
+
| Model 2 | ILO | ILO | 92% | 0.99|
|
130 |
+
| Model 2 | ILO | ICILS | 36% | 0.94|
|
131 |
+
| Model 3 | ICILS+ILO | ICILS+ILO | 80% | 0.93|
|
132 |
+
| Model 3 | ICILS+ILO | ICILS | 62% | 0.95|
|
133 |
|
134 |
+
### Examples
|
135 |
|
136 |
```python
|
137 |
def compute_metrics(p: EvalPrediction):
|
|
|
144 |
return result
|
145 |
```
|
146 |
|
|
|
|
|
147 |
## Limitations and Bias
|
148 |
|
149 |
+
No known limitations or bias.
|
|
|
|
|
150 |
|
151 |
## Citation
|
152 |
|
153 |
+
This metric was developed as part of an [IEA R&D project](https://www.iea.nl/publications/other/rd-outcomes), [Improving Parental Occupation Coding Procedures with AI](https://www.iea.nl/sites/default/files/2024-09/Improving-Parental-Occupation-Coding-Procedures-AI.pdf) (Duckworth et al, 2024). The metric was used to evaluate the multilingual [ICILS XLM-R ISCO](https://huggingface.co/ICILS/xlm-r-icils-ilo) classification model.
|
|
|
|
|
154 |
|
155 |
## Further References
|
156 |
|
157 |
+
- [The International Standard Classification of Occupations- ISCO-08](https://isco.ilo.org/en/isco-08/)
|
158 |
+
- [Improving Parental Occupation Coding Procedures with AI](https://www.iea.nl/sites/default/files/2024-09/Improving-Parental-Occupation-Coding-Procedures-AI.pdf) (Duckworth et al, 2024)
|
159 |
+
- [ICILS XLM-R ISCO model](https://huggingface.co/ICILS/xlm-r-icils-ilo)
|