Data Preparation¶
The first key component of the config file is the data_config
section.
experiment:
data_config:
strategy: dataset|fixed|hierarchy
dataset_path: this/is/the/path.tsv
root_folder: this/is/the/path
train_path: this/is/the/path.tsv
validation_path: this/is/the/path.tsv
test_path: this/is/the/path.tsv
binarize: True
side_information:
- dataloader: FeatureLoader1
map: this/is/the/path.tsv
features: this/is/the/path.tsv
properties: this/is/the/path.conf
- dataloader: FeatureLoader2
folder_map_features: this/is/the/path/folder
In this section, we can define which input files and how they should be loaded.
In the following, we will consider as datasets, tab-separated-value files that contain one interaction per row, in the format:
UserID
ItemID
Rating
[ TimeStamp
]
where TimeStamp
is optional.
Data preparation Strategies¶
According to the kind of data we have, we can choose among three different loading strategies: dataset
, fixed
, hierarchy
.
dataset
assumes that the input data is NOT previously split in training, validation, and test set.
For this reason, ONLY if we adopt a dataset strategy we can later perform prefiltering and splitting operations.
dataset
takes just ONE default parameter: dataset_path
, which points to the stored dataset.
experiment:
data_config:
strategy: dataset
dataset_path: this/is/the/path.tsv
fixed
strategy assumes that our data has been previously split into training/validation/test sets or training/test sets.
Since data is supposed as previously split, no further prefiltering and splitting operation is contemplated.
fixed
takes two mandatory parameters: train_path
and test_path
, and one optional parameter, validation_path
.
experiment:
data_config:
strategy: fixed
train_path: this/is/the/path.tsv
validation_path: this/is/the/path.tsv
test_path: this/is/the/path.tsv
The last strategy is hierarchy
.
hierarchy
is designed to load a dataset that has been previously split and filtered with Elliot.
Here, the data is assumed as split and no further prefiltering and splitting operations are needed.
hierarchy
takes one mandatory parameter, root_folder
, that points to the folder where we previously stored the split files.
When fixed
or dataset
strategy is selected it is also possible to use the flag binarize
to transform explicit
user/item feedbacks into implicit ones.
experiment:
data_config:
strategy: hierarchy
root_folder: this/is/the/path