Ibrahemqasim commited on
Commit
04ee729
·
verified ·
1 Parent(s): a814d50

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +12 -13
README.md CHANGED
@@ -1,25 +1,24 @@
1
  ---
2
  license: unknown
3
  ---
 
4
  # enwiki_to_arwiki_categories Dataset
5
 
6
  This dataset contains mappings between English Wikipedia categories and their corresponding Arabic Wikipedia categories.
7
 
8
  ## Files
9
 
10
- ### langlinks.json
11
-
12
- This file contains the original mappings as downloaded from the Hugging Face Hub. It contains 818,354 mappings.
13
-
14
- ### filtered_data.json
15
-
16
- This file contains the mappings after filtering out those that do not contain a 4-digit year. It contains 231,349 mappings.
17
-
18
- ### cats_2000.json
19
-
20
- This file contains the mappings after replacing all 4-digit years with the year 2000. It contains 20,913 mappings.
21
 
22
- ### cats_2000_contry.json
 
23
 
24
- This file contains the mappings after replacing all 4-digit years with the year 2000 and replacing country names with `country` word. It contains 538 mappings.
 
 
25
 
 
 
 
 
 
1
  ---
2
  license: unknown
3
  ---
4
+
5
  # enwiki_to_arwiki_categories Dataset
6
 
7
  This dataset contains mappings between English Wikipedia categories and their corresponding Arabic Wikipedia categories.
8
 
9
  ## Files
10
 
11
+ 1. **langlinks.json (818,354)**
12
+ * This file contains all category links from enwiki to arwiki.
 
 
 
 
 
 
 
 
 
13
 
14
+ 2. **filtered_data.json (231,314)**
15
+ * This file contains the mappings after filtering out those that do not contain a 4-digit year.
16
 
17
+ 3. **cats_2000.json (231,314)**
18
+ * This file contains the mappings from `filtered_data.json` with these changes:
19
+ 1. Replacing all 4-digit years with the year `2000`.
20
 
21
+ 4. **cats_2000_country.json (1,234)**
22
+ * This file contains the mappings from `filtered_data.json` with these changes:
23
+ 1. Replacing all 4-digit years with the year `2000`.
24
+ 2. Replacing country names with the word `country`. It contains 1,234 mappings.