|
--- |
|
license: unknown |
|
language: |
|
- ar |
|
tags: |
|
- Wikipedia |
|
- Wikipedia_Categories |
|
--- |
|
|
|
# enwiki_to_arwiki_categories Dataset |
|
|
|
This dataset contains mappings between English Wikipedia categories and their corresponding Arabic Wikipedia categories. |
|
|
|
## Files |
|
|
|
1. **[langlinks.json](langlinks.json) (818,354)** |
|
* This file contains all category links from enwiki to arwiki. |
|
* Dataset at: [Ibrahemqasim/categories_en2ar](https://huggingface.co/datasets/Ibrahemqasim/categories_en2ar) |
|
|
|
2. **[filtered_data.json](filtered_data.json) (231,314)** |
|
* This file contains the mappings after filtering out those that do not contain a 4-digit year. |
|
* Dataset at: [Ibrahemqasim/categories_en2ar_with_years](https://huggingface.co/datasets/Ibrahemqasim/categories_en2ar_with_years) |
|
|
|
3. **[cats_2000.json](cats_2000.json) (21,170)** |
|
* This file contains the mappings from `filtered_data.json` with these changes: |
|
1. Replacing all 4-digit years with the year `2000`. |
|
* Dataset at: [Ibrahemqasim/categories_en2ar-cats_2000](https://huggingface.co/datasets/Ibrahemqasim/categories_en2ar-cats_2000) |
|
|
|
4. **[cats_2000_country.json](cats_2000_country.json) (1,234)** |
|
* This file contains the mappings from `filtered_data.json` with these changes: |
|
1. Replacing all 4-digit years with the year `2000`. |
|
2. Replacing country names with the word `country`. It contains 1,234 mappings. |
|
* Dataset at: [Ibrahemqasim/categories_en2ar-cats_2000_contry](https://huggingface.co/datasets/Ibrahemqasim/categories_en2ar-cats_2000_contry) |