Splitting Data

If needed, the data is served to the Splitting module. In detail, Elliot provides (i)Temporal, (ii)Random, and (iii)Fix strategies. The Temporal strategy splits the user-item interactions based on the transaction timestamp, i.e., fixing the timestamp, find-ing the optimal one, or adopting a hold-out (HO) mechanism. The Random strategy includes hold-out (HO),𝐾-repeated hold-out(K-HO), and cross-validation (CV). Table 1 provides further configuration details. Finally, the Fix strategy exploits a precomputed splitting.

Elliot provides several splitting strategies. To enable the splitting operations, we can insert the corresponding section:

experiment:
  splitting:
    save_on_disk: True
    save_folder: this/is/the/path/
    test_splitting:
        strategy: fixed_timestamp|temporal_hold_out|random_subsampling|random_cross_validation
        timestamp: best|1609786061
        test_ratio: 0.2
        leave_n_out: 1
        folds: 5
    validation_splitting:
        strategy: fixed_timestamp|temporal_hold_out|random_subsampling|random_cross_validation
        timestamp: best|1609786061
        test_ratio: 0.2
        leave_n_out: 1
        folds: 5

Before deepening the splitting configurations, we can configure Elliot to save on disk the split files, once the splitting operation is completed.

To this extent, we can insert two fields into the section: save_on_disk, and save_folder.

save_on_disk enables the writing process, and save_folder specifies the system location where to save the split files:

experiment:
  splitting:
    save_on_disk: True
    save_folder: this/is/the/path/

Now, we can insert one (or two) specific subsections to detail the train/test, and the train/validation splitting via the corresponding fields: test_splitting, and validation_splitting. test_splitting is clearly mandatory, while validation_splitting is optional. Since the two subsections follow the same guidelines, here we detail test_splitting without loss of generality.

Elliot enables four splitting families: fixed_timestamp, temporal_hold_out, random_subsampling, random_cross_validation.

fixed_timestamp assumes that there will be a specific timestamp to split prior interactions (train) and future interactions. It takes the parameter timestamp, that can assume one of two possible kind of values: a long corresponding to a specific timestamp, or the string best computed following Anelli et al.

experiment:
  splitting:
    test_splitting:
        strategy: fixed_timestamp
        timestamp: 1609786061
experiment:
  splitting:
    test_splitting:
        strategy: fixed_timestamp
        timestamp: best

temporal_hold_out relies on a temporal split of user transactions. The split can be realized following two different approaches: a ratio-based and a leave-n-out-based approach. If we enable the test_ratio field with a float value, Elliot splits data retaining the last (100 * test_ratio) % of the user transactions for the test set. If we enable the leave_n_out field with an int value, Elliot retains the last leave_n_out transactions for the test set.

experiment:
  splitting:
    test_splitting:
        strategy: temporal_hold_out
        test_ratio: 0.2
experiment:
  splitting:
    test_splitting:
        strategy: temporal_hold_out
        leave_n_out: 1

random_subsampling generalizes random hold-out strategy. It takes a test_ratio parameter with a float value to define the train/test ratio for user-based hold-out splitting. Alternatively, it can take leave_n_out with an int value to define the number of transaction retained for the test set. Moreover, the splitting operation can be repeated enabling the folds field and passing an int. In that case, the overall splitting strategy corresponds to a user-based random subsampling strategy.

experiment:
  splitting:
    test_splitting:
        strategy: random_subsampling
        test_ratio: 0.2
experiment:
  splitting:
    test_splitting:
        strategy: random_subsampling
        test_ratio: 0.2
        folds: 5
experiment:
  splitting:
    test_splitting:
        strategy: random_subsampling
        leave_n_out: 1
        folds: 5

random_cross_validation adopts a k-folds cross-validation splitting strategy. It takes the parameter folds with an int value, that defines the overall number of folds to consider.

experiment:
  splitting:
    test_splitting:
        strategy: random_cross_validation
        folds: 5