danieldux commited on
Commit
b0ebc4d
·
verified ·
1 Parent(s): c09e79b

Update README.md

Browse files

Fix math syntax and edit some sections.

Files changed (1) hide show
  1. README.md +52 -40
README.md CHANGED
@@ -16,7 +16,7 @@ pinned: false
16
 
17
  ## Metric Description
18
 
19
- The hierarchical structure of the ISCO-08 classification scheme, as depicted in the Figure 1, is organized into four levels, delineated by the specificity of their codes:
20
 
21
  |Digits|Group level |
22
  |--|--|
@@ -25,9 +25,7 @@ The hierarchical structure of the ISCO-08 classification scheme, as depicted in
25
  | 3-digits | Minor groups |
26
  | 4-digits | Unit groups |
27
 
28
- *Figure 1: Hierarchical structure of the ISCO-08 classification scheme*
29
-
30
- ![Figure 1: ISCO-08 DAG class hierarchy. The filled node 2211 represents the real category of a sample](https://huggingface.co/spaces/danieldux/isco_hierarchical_accuracy/resolve/main/figure_1.png)
31
 
32
  In this context, the hierarchical accuracy measure is specifically designed to evaluate classifications within this structured framework. It emphasizes the importance of precision in classifying occupations at the correct level of specificity:
33
 
@@ -50,48 +48,62 @@ The measure applies a higher penalty for errors that occur between more distant
50
 
51
  Misclassification among sibling categories (e.g., between different Minor groups within the same Sub-major group) is less severe than misclassification at a higher hierarchical level (e.g., between different Major groups).
52
 
53
- The measure extends the concepts of precision and recall into a hierarchical context, introducing hierarchical precision ($hP$) and hierarchical recall ($hR$). In this framework, each sample belongs not only to its designated class but also to all ancestor categories in the hierarchy, excluding the root (we exclude the root of the graph, since all samples belong to the root by default). This adjustment allows the measure to account for the hierarchical structure of the classification scheme, rewarding more accurate location of a sample within the hierarchy and penalizing errors based on their hierarchical significance.
54
 
55
- To calculate the hierarchical measure, we extend the set of real classes $C_i = \{G\}$ with all ancestors of class $G:\vec{C}_i = \{B, C, E, G\}$.
56
 
57
- We also extend the set of predicted classes $C^′_i = \{F\}$ with all ancestors of class $F : \vec{C}^′_i = \{C, F\}$.
58
 
59
- So, class $C$ is the only correctly assigned label from the extended set:$| \vec{C}_i ∩ \vec{C}^′_i| = 1$.
60
 
61
- There are $| \vec{C}^′_i| = 2$ assigned labels and $| \vec{C}_i| = 4$ real classes.
62
 
63
- Therefore, we get:
64
 
65
- $hP = \frac{| \vec{C}_i ∩ \vec{C}^′_i|} {|\vec{C}^′_i |} = \frac{1}{2}$
66
 
67
- $hR = \frac{| \vec{C}_i ∩ \vec{C}^′_i|} {|\vec{C}_i |} = \frac{1}{2}$
68
 
69
- We also can combine the two values $hP$ and $hR$ into one hF-measure:
70
 
71
- $$
72
- hF_β = \frac{(β^2 + 1) · hP · hR}{(β^2 · hP + hR)}, β ∈ [0, +∞)
73
- $$
74
 
75
- ## How to Use
76
 
77
- *Give general statement of how to use the metric*
78
 
79
- *Provide simplest possible example for using the metric*
80
 
81
- TBA
82
 
83
- ### Inputs
 
 
 
 
 
 
84
 
85
- *List all input arguments in the format below*
 
 
 
 
 
 
 
 
 
 
 
 
 
86
 
87
- - **input_field** *(type): Definition of input, with explanation if necessary. State any default value(s).*
88
  - **references** *(List[str])*: List of reference ISCO-08 codes (true labels). This is the ground truth.
89
  - **predictions** *(List[str])*: List of predicted ISCO-08 codes (predicted labels). This is the predicted classification or classification to compare against the ground truth.
90
 
91
  ### Output Values
92
 
93
- *Explain what this metric outputs and provide an example of what the metric output looks like. Modules should return a dictionary with one or multiple key-value pairs, e.g. {"bleu" : 6.02}*
94
-
95
  **Example output**:
96
 
97
  ```python
@@ -107,13 +119,19 @@ Values are decimal numbers between 0 and 1. Higher scores are better.
107
 
108
  #### Values from Popular Papers
109
 
110
- *Give examples, preferrably with links to leaderboards or publications, to papers that have reported this metric, along with the values they have reported.*
111
 
112
- TBA
113
 
114
- ### Examples
 
 
 
 
 
 
115
 
116
- *Give code examples of the metric being used. Try to include examples that clear up any potential ambiguity left from the metric description above. If possible, provide a range of examples that show both typical and atypical results, as well as examples where a variety of input parameters are passed.*
117
 
118
  ```python
119
  def compute_metrics(p: EvalPrediction):
@@ -126,22 +144,16 @@ def compute_metrics(p: EvalPrediction):
126
  return result
127
  ```
128
 
129
- More TBA
130
-
131
  ## Limitations and Bias
132
 
133
- *Note any known limitations or biases that the metric has, with links and references if possible.*
134
-
135
- TBA
136
 
137
  ## Citation
138
 
139
- This metric was developed as part of an IEA R&D project, [Improving Parental Occupation Coding Procedures with AI](https://www.iea.nl/sites/default/files/2024-09/Improving-Parental-Occupation-Coding-Procedures-AI.pdf) (Duckworth et al, 2024). The metric is used to evaluate the multilingual [ICILS XLM-R ISCO](https://huggingface.co/ICILS/xlm-r-icils-ilo) classification model.
140
-
141
- TBA
142
 
143
  ## Further References
144
 
145
- *Add any useful further references.*
146
-
147
- TBA
 
16
 
17
  ## Metric Description
18
 
19
+ ISCO‑08 is a four‑level taxonomy (Table 1). Correctly locating an occupation at the appropriate level—Major, Sub‑major, Minor or Unit—is essential, yet many practical systems make small but still useful errors (e.g. confusing two Unit groups within the same Minor group). The present metric extends classical precision, recall and *F*‑measure to this hierarchical setting, thereby providing a more nuanced assessment than flat accuracy.
20
 
21
  |Digits|Group level |
22
  |--|--|
 
25
  | 3-digits | Minor groups |
26
  | 4-digits | Unit groups |
27
 
28
+ ![Figure 1 – An excerpt of the ISCO08 hierarchy showing the ancestors of node 2211](https://huggingface.co/spaces/danieldux/isco_hierarchical_accuracy/resolve/main/figure_1.png)
 
 
29
 
30
  In this context, the hierarchical accuracy measure is specifically designed to evaluate classifications within this structured framework. It emphasizes the importance of precision in classifying occupations at the correct level of specificity:
31
 
 
48
 
49
  Misclassification among sibling categories (e.g., between different Minor groups within the same Sub-major group) is less severe than misclassification at a higher hierarchical level (e.g., between different Major groups).
50
 
51
+ The measure extends the concepts of precision and recall into a hierarchical context, introducing hierarchical precision (ℎ𝑃) and hierarchical recall (ℎ𝑅). In this framework, each sample belongs not only to its designated class but also to all ancestor categories in the hierarchy, excluding the root (we exclude the root of the graph, since all samples belong to the root by default). This adjustment allows the measure to account for the hierarchical structure of the classification scheme, rewarding more accurate location of a sample within the hierarchy and penalizing errors based on their hierarchical significance.
52
 
53
+ To calculate the hierarchical measure, extend the set of real classes
54
 
55
+ $$C_i = \{G\}$$
56
 
57
+ with all ancestors of 𝐺:
58
 
59
+ $$\vec{C}_i = \{B, C, E, G\}$$
60
 
61
+ Similarly, extend the set of predicted classes
62
 
63
+ $$C^′_i = \{F\}$$
64
 
65
+ with all ancestors of 𝐹:
66
 
67
+ $$\vec{C}^′_i = \{C, F\}$$
68
 
69
+ Class 𝐶 is the only correctly assigned label from the extended sets:
 
 
70
 
71
+ $$| \vec{C}_i \vec{C}^′_i| = 1$$
72
 
73
+ There are
74
 
75
+ $$| \vec{C}^′_i| = 2$$
76
 
77
+ predicted labels and
78
 
79
+ $$| \vec{C}_i| = 4$$
80
+
81
+ real classes.
82
+
83
+ Therefore,
84
+
85
+ $$hP = \frac{| \vec{C}_i ∩ \vec{C}^′_i|} {|\vec{C}^′_i |} = \frac{1}{2}$$
86
 
87
+ $$hR = \frac{| \vec{C}_i \vec{C}^′_i|} {|\vec{C}_i |} = \frac{1}{2}$$
88
+
89
+ Finally, combine ℎ𝑃 and ℎ𝑅 into the hierarchical 𝐹-measure:
90
+
91
+ $$hF_β = \frac{(β^2 + 1) · hP · hR}{(β^2 · hP + hR)}, β ∈ [0, +∞)$$
92
+
93
+ The metric **rewards depth**: predicting *221* instead of *22* yields higher *hF* because more ancestors overlap. It **penalises distance**: predicting a Unit in a different Sub‑major group incurs a larger loss than confusing two Units under the same Minor group.
94
+
95
+ ## How to Use
96
+
97
+ - **Model evaluation** Assess neural or rule‑based classifiers that map free‑text occupation descriptions to ISCO‑08 codes.
98
+ - **Inter‑rater agreement** Quantify consistency between human coders when full agreement at the Unit level is not always expected.
99
+
100
+ ### Inputs
101
 
 
102
  - **references** *(List[str])*: List of reference ISCO-08 codes (true labels). This is the ground truth.
103
  - **predictions** *(List[str])*: List of predicted ISCO-08 codes (predicted labels). This is the predicted classification or classification to compare against the ground truth.
104
 
105
  ### Output Values
106
 
 
 
107
  **Example output**:
108
 
109
  ```python
 
119
 
120
  #### Values from Popular Papers
121
 
122
+ The following table is from the paper [Improving Parental Occupation Coding Procedures with AI](https://www.iea.nl/sites/default/files/2024-09/Improving-Parental-Occupation-Coding-Procedures-AI.pdf).
123
 
124
+ **Summary of multilingual model overall accuracies and hF-measure scores**
125
 
126
+ | Model name | Training dataset (training & validation splits) | Evaluation dataset (test split) | Accuracy | hFβ |
127
+ |------------|--------------------------------------------------|---------------------------------|----------|-----|
128
+ | Model 1 | ICILS | ICILS | 63% | 0.89|
129
+ | Model 2 | ILO | ILO | 92% | 0.99|
130
+ | Model 2 | ILO | ICILS | 36% | 0.94|
131
+ | Model 3 | ICILS+ILO | ICILS+ILO | 80% | 0.93|
132
+ | Model 3 | ICILS+ILO | ICILS | 62% | 0.95|
133
 
134
+ ### Examples
135
 
136
  ```python
137
  def compute_metrics(p: EvalPrediction):
 
144
  return result
145
  ```
146
 
 
 
147
  ## Limitations and Bias
148
 
149
+ No known limitations or bias.
 
 
150
 
151
  ## Citation
152
 
153
+ This metric was developed as part of an [IEA R&D project](https://www.iea.nl/publications/other/rd-outcomes), [Improving Parental Occupation Coding Procedures with AI](https://www.iea.nl/sites/default/files/2024-09/Improving-Parental-Occupation-Coding-Procedures-AI.pdf) (Duckworth et al, 2024). The metric was used to evaluate the multilingual [ICILS XLM-R ISCO](https://huggingface.co/ICILS/xlm-r-icils-ilo) classification model.
 
 
154
 
155
  ## Further References
156
 
157
+ - [The International Standard Classification of Occupations- ISCO-08](https://isco.ilo.org/en/isco-08/)
158
+ - [Improving Parental Occupation Coding Procedures with AI](https://www.iea.nl/sites/default/files/2024-09/Improving-Parental-Occupation-Coding-Procedures-AI.pdf) (Duckworth et al, 2024)
159
+ - [ICILS XLM-R ISCO model](https://huggingface.co/ICILS/xlm-r-icils-ilo)