Spaces:

danieldux
/

isco_hierarchical_accuracy

Sleeping

App Files Files Community

danieldux commited on Apr 18

Commit

b0ebc4d

verified ·

1 Parent(s): c09e79b

Update README.md

Browse files

Fix math syntax and edit some sections.

Files changed (1) hide show

README.md +52 -40

README.md CHANGED Viewed

@@ -16,7 +16,7 @@ pinned: false
 ## Metric Description
-The hierarchical structure of the ISCO-08 classification scheme, as depicted in the Figure 1, is organized into four levels, delineated by the specificity of their codes:
 |Digits|Group level  |
 |--|--|
@@ -25,9 +25,7 @@ The hierarchical structure of the ISCO-08 classification scheme, as depicted in
 | 3-digits | Minor groups |
 | 4-digits | Unit groups |
-*Figure 1: Hierarchical structure of the ISCO-08 classification scheme*
-![Figure 1: ISCO-08 DAG class hierarchy. The filled node 2211 represents the real category of a sample](https://huggingface.co/spaces/danieldux/isco_hierarchical_accuracy/resolve/main/figure_1.png)
 In this context, the hierarchical accuracy measure is specifically designed to evaluate classifications within this structured framework. It emphasizes the importance of precision in classifying occupations at the correct level of specificity:
@@ -50,48 +48,62 @@ The measure applies a higher penalty for errors that occur between more distant
 Misclassification among sibling categories (e.g., between different Minor groups within the same Sub-major group) is less severe than misclassification at a higher hierarchical level (e.g., between different Major groups).
-The measure extends the concepts of precision and recall into a hierarchical context, introducing hierarchical precision ($hP$) and hierarchical recall ($hR$). In this framework, each sample belongs not only to its designated class but also to all ancestor categories in the hierarchy, excluding the root (we exclude the root of the graph, since all samples belong to the root by default). This adjustment allows the measure to account for the hierarchical structure of the classification scheme, rewarding more accurate location of a sample within the hierarchy and penalizing errors based on their hierarchical significance.
-To calculate the hierarchical measure, we extend the set of real classes $C_i = \{G\}$ with all ancestors of class $G:\vec{C}_i = \{B, C, E, G\}$.
-We also extend the set of predicted classes $C^′_i = \{F\}$ with all ancestors of class $F : \vec{C}^′_i = \{C, F\}$.
-So, class $C$ is the only correctly assigned label from the extended set:$| \vec{C}_i ∩ \vec{C}^′_i| = 1$.
-There are $| \vec{C}^′_i| = 2$ assigned labels and $| \vec{C}_i| = 4$ real classes.
-Therefore, we get:
-$hP = \frac{| \vec{C}_i ∩ \vec{C}^′_i|}  {|\vec{C}^′_i |} = \frac{1}{2}$
-$hR = \frac{| \vec{C}_i ∩ \vec{C}^′_i|}  {|\vec{C}_i |} = \frac{1}{2}$
-We also can combine the two values $hP$ and $hR$ into one hF-measure:
-$$
-hF_β = \frac{(β^2 + 1) · hP · hR}{(β^2 · hP + hR)}, β ∈ [0, +∞)
-$$
-## How to Use
-*Give general statement of how to use the metric*
-*Provide simplest possible example for using the metric*
-TBA
-### Inputs
-*List all input arguments in the format below*
-- **input_field** *(type): Definition of input, with explanation if necessary. State any default value(s).*
 - **references** *(List[str])*: List of reference ISCO-08 codes (true labels). This is the ground truth.
 - **predictions** *(List[str])*: List of predicted ISCO-08 codes (predicted labels). This is the predicted classification or classification to compare against the ground truth.
 ### Output Values
-*Explain what this metric outputs and provide an example of what the metric output looks like. Modules should return a dictionary with one or multiple key-value pairs, e.g. {"bleu" : 6.02}*
 **Example output**:
 ```python
@@ -107,13 +119,19 @@ Values are decimal numbers between 0 and 1. Higher scores are better.
 #### Values from Popular Papers
-*Give examples, preferrably with links to leaderboards or publications, to papers that have reported this metric, along with the values they have reported.*
-TBA
-### Examples
-*Give code examples of the metric being used. Try to include examples that clear up any potential ambiguity left from the metric description above. If possible, provide a range of examples that show both typical and atypical results, as well as examples where a variety of input parameters are passed.*
 ```python
 def compute_metrics(p: EvalPrediction):
@@ -126,22 +144,16 @@ def compute_metrics(p: EvalPrediction):
     return result
 ```
-More TBA
 ## Limitations and Bias
-*Note any known limitations or biases that the metric has, with links and references if possible.*
-TBA
 ## Citation
-This metric was developed as part of an IEA R&D project, [Improving Parental Occupation Coding Procedures with AI](https://www.iea.nl/sites/default/files/2024-09/Improving-Parental-Occupation-Coding-Procedures-AI.pdf) (Duckworth et al, 2024). The metric is used to evaluate the multilingual [ICILS XLM-R ISCO](https://huggingface.co/ICILS/xlm-r-icils-ilo) classification model.
-TBA
 ## Further References
-*Add any useful further references.*
-TBA

 ## Metric Description
+ISCO‑08 is a four‑level taxonomy (Table 1). Correctly locating an occupation at the appropriate level—Major, Sub‑major, Minor or Unit—is essential, yet many practical systems make small but still useful errors (e.g. confusing two Unit groups within the same Minor group). The present metric extends classical precision, recall and *F*‑measure to this hierarchical setting, thereby providing a more nuanced assessment than flat accuracy.
 |Digits|Group level  |
 |--|--|
 | 3-digits | Minor groups |
 | 4-digits | Unit groups |
+![Figure 1 – An excerpt of the ISCO‑08 hierarchy showing the ancestors of node 2211](https://huggingface.co/spaces/danieldux/isco_hierarchical_accuracy/resolve/main/figure_1.png)
 In this context, the hierarchical accuracy measure is specifically designed to evaluate classifications within this structured framework. It emphasizes the importance of precision in classifying occupations at the correct level of specificity:
 Misclassification among sibling categories (e.g., between different Minor groups within the same Sub-major group) is less severe than misclassification at a higher hierarchical level (e.g., between different Major groups).
+The measure extends the concepts of precision and recall into a hierarchical context, introducing hierarchical precision (ℎ𝑃) and hierarchical recall (ℎ𝑅). In this framework, each sample belongs not only to its designated class but also to all ancestor categories in the hierarchy, excluding the root (we exclude the root of the graph, since all samples belong to the root by default). This adjustment allows the measure to account for the hierarchical structure of the classification scheme, rewarding more accurate location of a sample within the hierarchy and penalizing errors based on their hierarchical significance.
+To calculate the hierarchical measure, extend the set of real classes
+$$C_i = \{G\}$$
+with all ancestors of 𝐺:
+$$\vec{C}_i = \{B, C, E, G\}$$
+Similarly, extend the set of predicted classes
+$$C^′_i = \{F\}$$
+with all ancestors of 𝐹:
+$$\vec{C}^′_i = \{C, F\}$$
+Class 𝐶 is the only correctly assigned label from the extended sets:
+$$| \vec{C}_i ∩ \vec{C}^′_i| = 1$$
+There are
+$$| \vec{C}^′_i| = 2$$
+predicted labels and
+$$| \vec{C}_i| = 4$$
+real classes.
+Therefore,
+$$hP = \frac{| \vec{C}_i ∩ \vec{C}^′_i|}  {|\vec{C}^′_i |} = \frac{1}{2}$$
+$$hR = \frac{| \vec{C}_i ∩ \vec{C}^′_i|}  {|\vec{C}_i |} = \frac{1}{2}$$
+Finally, combine ℎ𝑃 and ℎ𝑅 into the hierarchical 𝐹-measure:
+$$hF_β = \frac{(β^2 + 1) · hP · hR}{(β^2 · hP + hR)}, β ∈ [0, +∞)$$
+The metric **rewards depth**: predicting *221* instead of *22* yields higher *hF* because more ancestors overlap. It **penalises distance**: predicting a Unit in a different Sub‑major group incurs a larger loss than confusing two Units under the same Minor group.
+## How to Use
+- **Model evaluation** Assess neural or rule‑based classifiers that map free‑text occupation descriptions to ISCO‑08 codes.
+- **Inter‑rater agreement** Quantify consistency between human coders when full agreement at the Unit level is not always expected.
+### Inputs
 - **references** *(List[str])*: List of reference ISCO-08 codes (true labels). This is the ground truth.
 - **predictions** *(List[str])*: List of predicted ISCO-08 codes (predicted labels). This is the predicted classification or classification to compare against the ground truth.
 ### Output Values
 **Example output**:
 ```python
 #### Values from Popular Papers
+The following table is from the paper [Improving Parental Occupation Coding Procedures with AI](https://www.iea.nl/sites/default/files/2024-09/Improving-Parental-Occupation-Coding-Procedures-AI.pdf).
+**Summary of multilingual model overall accuracies and hF-measure scores**
+| Model name | Training dataset (training & validation splits) | Evaluation dataset (test split) | Accuracy | hFβ |
+|------------|--------------------------------------------------|---------------------------------|----------|-----|
+| Model 1    | ICILS                                            | ICILS                           | 63%      | 0.89|
+| Model 2    | ILO                                              | ILO                             | 92%      | 0.99|
+| Model 2    | ILO                                              | ICILS                           | 36%      | 0.94|
+| Model 3    | ICILS+ILO                                        | ICILS+ILO                       | 80%      | 0.93|
+| Model 3    | ICILS+ILO                                        | ICILS                           | 62%      | 0.95|
+### Examples
 ```python
 def compute_metrics(p: EvalPrediction):
     return result
 ```
 ## Limitations and Bias
+No known limitations or bias.
 ## Citation
+This metric was developed as part of an [IEA R&D project](https://www.iea.nl/publications/other/rd-outcomes), [Improving Parental Occupation Coding Procedures with AI](https://www.iea.nl/sites/default/files/2024-09/Improving-Parental-Occupation-Coding-Procedures-AI.pdf) (Duckworth et al, 2024). The metric was used to evaluate the multilingual [ICILS XLM-R ISCO](https://huggingface.co/ICILS/xlm-r-icils-ilo) classification model.
 ## Further References
+- [The International Standard Classification of Occupations- ISCO-08](https://isco.ilo.org/en/isco-08/)
+- [Improving Parental Occupation Coding Procedures with AI](https://www.iea.nl/sites/default/files/2024-09/Improving-Parental-Occupation-Coding-Procedures-AI.pdf) (Duckworth et al, 2024)
+- [ICILS XLM-R ISCO model](https://huggingface.co/ICILS/xlm-r-icils-ilo)