Data Preparation¶
The first key component of the config file is the data_config
section.
experiment:
data_config:
strategy: dataset|fixed|hierarchy
dataloader: KnowledgeChainsLoader|DataSetLoader
dataset_path: this/is/the/path.tsv
root_folder: this/is/the/path
train_path: this/is/the/path.tsv
validation_path: this/is/the/path.tsv
test_path: this/is/the/path.tsv
side_information:
feature_data: this/is/the/path.tsv
map: this/is/the/path.tsv
features: this/is/the/path.tsv
properties: this/is/the/path.conf
In this section, we can define which input files and how they should be loaded.
In the following, we will consider as datasets, tab-separated-value files that contain one interaction per row, in the format:
UserID
ItemID
Rating
[ TimeStamp
]
where TimeStamp
is optional.
Data preparation Strategies¶
According to the kind of data we have, we can choose among three different loading strategies: dataset
, fixed
, hierarchy
.
dataset
assumes that the input data is NOT previously split in training, validation, and test set.
For this reason, ONLY if we adopt a dataset strategy we can later perform prefiltering and splitting operations.
dataset
takes just ONE default parameter: dataset_path
, which points to the stored dataset.
experiment:
data_config:
strategy: dataset
dataset_path: this/is/the/path.tsv
fixed
strategy assumes that our data has been previously split into training/validation/test sets or training/test sets.
Since data is supposed as previously split, no further prefiltering and splitting operation is contemplated.
fixed
takes two mandatory parameters: train_path
and test_path
, and one optional parameter, validation_path
.
experiment:
data_config:
strategy: fixed
train_path: this/is/the/path.tsv
validation_path: this/is/the/path.tsv
test_path: this/is/the/path.tsv
The last strategy is hierarchy
.
hierarchy
is designed to load a dataset that has been previously split and filtered with Elliot.
Here, the data is assumed as split and no further prefiltering and splitting operations are needed.
hierarchy
takes one mandatory parameter, root_folder
, that points to the folder where we previously stored the split files.
experiment:
data_config:
strategy: hierarchy
root_folder: this/is/the/path