deepcode-ai
/

Compact_Facts

Model card Files Files and versions Community

Compact_Facts / data /Readme.md

khulnasoft's picture

Upload 108 files

4fb0bd1 verified about 1 year ago

|

history blame contribute delete

1.94 kB

	## Dataset Processing

	### Our Benchmark (processed OIE2016)

	Firstly, download our benchmark tailored for compact extractions provided [`here`](https://zenodo.org/record/7014032#.YwQQ0OzMJb8) and put it under [`data/OIE2016(processed)`](https://github.com/FarimaFatahi/CompactIE/tree/master/data/OIE2016(processed)).
	Secondly, split out the train, development, test set for the constituent extraction model by running:
	```
	cd OIE2016(processed)/constituent_model
	python process_constituent_data.py
	```
	Lastly, split out the train, development, test set for the constituent linking model by running:
	```
	cd OIE2016(processed)/relation_model
	python process_linking_data.py
	```
	Note that the data folders for training each model are set to the ones mentioned above.

	### Evaluation Benchmarks

	Three evaluation benchmarks (BenchIE, CaRB, and Wire57) are used for evaluating CompactIE's performance. Note that since these datasets are not targeted for compact triples, we exclude triples that have at least one clause within a constituent.
	To get the final data (json format) for these benchmarks, run:

	```bash
	./process_test_data.sh
	```

	### Other files
	Since the schema design of the table filling model does not support conjunctions inside constituents, we use the conjunction module developed by [`OpenIE6`](https://github.com/dair-iitd/openie6) to break sentences into smaller conjunction-free sentences before passing them to the system.
	Therefore, input new test files (`source_file.txt`), produce the conjunction file (`conjunctions.txt`) and then run:
	```
	python process.py --source_file source_file.txt --target_file output.json --conjunctions_file conjunctions.txt
	```
	### Compactness measurement
	To measure the compactness metrics mentioned in the paper (AL, NCC, RPA), set the `INPUT_FILE` variable inside the following scrip to the test file path and run it as follows:
	```
	python compactness_measurements.py
	```