|
## Dataset Processing |
|
|
|
### Our Benchmark (processed OIE2016) |
|
|
|
Firstly, download our benchmark tailored for compact extractions provided [`here`](https://zenodo.org/record/7014032#.YwQQ0OzMJb8) and put it under [`data/OIE2016(processed)`](https://github.com/FarimaFatahi/CompactIE/tree/master/data/OIE2016(processed)). |
|
Secondly, split out the train, development, test set for the constituent extraction model by running: |
|
``` |
|
cd OIE2016(processed)/constituent_model |
|
python process_constituent_data.py |
|
``` |
|
Lastly, split out the train, development, test set for the constituent linking model by running: |
|
``` |
|
cd OIE2016(processed)/relation_model |
|
python process_linking_data.py |
|
``` |
|
Note that the data folders for training each model are set to the ones mentioned above. |
|
|
|
### Evaluation Benchmarks |
|
|
|
Three evaluation benchmarks (**BenchIE**, **CaRB**, and **Wire57**) are used for evaluating CompactIE's performance. Note that since these datasets are not targeted for compact triples, we exclude triples that have at least one clause within a constituent. |
|
To get the final data (json format) for these benchmarks, run: |
|
|
|
```bash |
|
./process_test_data.sh |
|
``` |
|
|
|
### Other files |
|
Since the schema design of the table filling model does not support conjunctions inside constituents, we use the conjunction module developed by [`OpenIE6`](https://github.com/dair-iitd/openie6) to break sentences into smaller conjunction-free sentences before passing them to the system. |
|
Therefore, input new test files (`source_file.txt`), produce the conjunction file (`conjunctions.txt`) and then run: |
|
``` |
|
python process.py --source_file source_file.txt --target_file output.json --conjunctions_file conjunctions.txt |
|
``` |
|
### Compactness measurement |
|
To measure the compactness metrics mentioned in the paper (AL, NCC, RPA), set the `INPUT_FILE` variable inside the following scrip to the test file path and run it as follows: |
|
``` |
|
python compactness_measurements.py |
|
``` |
|
|
|
|
|
|