Update README.md
Browse files
README.md
CHANGED
@@ -1,25 +1,24 @@
|
|
1 |
---
|
2 |
license: unknown
|
3 |
---
|
|
|
4 |
# enwiki_to_arwiki_categories Dataset
|
5 |
|
6 |
This dataset contains mappings between English Wikipedia categories and their corresponding Arabic Wikipedia categories.
|
7 |
|
8 |
## Files
|
9 |
|
10 |
-
|
11 |
-
|
12 |
-
This file contains the original mappings as downloaded from the Hugging Face Hub. It contains 818,354 mappings.
|
13 |
-
|
14 |
-
### filtered_data.json
|
15 |
-
|
16 |
-
This file contains the mappings after filtering out those that do not contain a 4-digit year. It contains 231,349 mappings.
|
17 |
-
|
18 |
-
### cats_2000.json
|
19 |
-
|
20 |
-
This file contains the mappings after replacing all 4-digit years with the year 2000. It contains 20,913 mappings.
|
21 |
|
22 |
-
|
|
|
23 |
|
24 |
-
|
|
|
|
|
25 |
|
|
|
|
|
|
|
|
|
|
1 |
---
|
2 |
license: unknown
|
3 |
---
|
4 |
+
|
5 |
# enwiki_to_arwiki_categories Dataset
|
6 |
|
7 |
This dataset contains mappings between English Wikipedia categories and their corresponding Arabic Wikipedia categories.
|
8 |
|
9 |
## Files
|
10 |
|
11 |
+
1. **langlinks.json (818,354)**
|
12 |
+
* This file contains all category links from enwiki to arwiki.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
13 |
|
14 |
+
2. **filtered_data.json (231,314)**
|
15 |
+
* This file contains the mappings after filtering out those that do not contain a 4-digit year.
|
16 |
|
17 |
+
3. **cats_2000.json (231,314)**
|
18 |
+
* This file contains the mappings from `filtered_data.json` with these changes:
|
19 |
+
1. Replacing all 4-digit years with the year `2000`.
|
20 |
|
21 |
+
4. **cats_2000_country.json (1,234)**
|
22 |
+
* This file contains the mappings from `filtered_data.json` with these changes:
|
23 |
+
1. Replacing all 4-digit years with the year `2000`.
|
24 |
+
2. Replacing country names with the word `country`. It contains 1,234 mappings.
|