Data Preparation¶

The first key component of the config file is the data_config section.

experiment:
  data_config:
    strategy: dataset|fixed|hierarchy
    dataset_path: this/is/the/path.tsv
    root_folder: this/is/the/path
    train_path: this/is/the/path.tsv
    validation_path: this/is/the/path.tsv
    test_path: this/is/the/path.tsv
  binarize: True
    side_information:
        - dataloader: FeatureLoader1
          map: this/is/the/path.tsv
          features: this/is/the/path.tsv
          properties: this/is/the/path.conf
        - dataloader: FeatureLoader2
          folder_map_features: this/is/the/path/folder

In this section, we can define which input files and how they should be loaded.

In the following, we will consider as datasets, tab-separated-value files that contain one interaction per row, in the format:

UserID ItemID Rating [ TimeStamp ]

where TimeStamp is optional.

Data preparation Strategies¶

According to the kind of data we have, we can choose among three different loading strategies: dataset, fixed, hierarchy.

dataset assumes that the input data is NOT previously split in training, validation, and test set. For this reason, ONLY if we adopt a dataset strategy we can later perform prefiltering and splitting operations.

dataset takes just ONE default parameter: dataset_path, which points to the stored dataset.

experiment:
  data_config:
    strategy: dataset
    dataset_path: this/is/the/path.tsv

fixed strategy assumes that our data has been previously split into training/validation/test sets or training/test sets. Since data is supposed as previously split, no further prefiltering and splitting operation is contemplated.

fixed takes two mandatory parameters: train_path and test_path, and one optional parameter, validation_path.

experiment:
  data_config:
    strategy: fixed
    train_path: this/is/the/path.tsv
    validation_path: this/is/the/path.tsv
    test_path: this/is/the/path.tsv

The last strategy is hierarchy. hierarchy is designed to load a dataset that has been previously split and filtered with Elliot. Here, the data is assumed as split and no further prefiltering and splitting operations are needed.

hierarchy takes one mandatory parameter, root_folder, that points to the folder where we previously stored the split files.

When fixed or dataset strategy is selected it is also possible to use the flag binarize to transform explicit user/item feedbacks into implicit ones.

experiment:
  data_config:
    strategy: hierarchy
    root_folder: this/is/the/path