Truncation strategies involve distorting preferences by shortening the list of acceptable matches without changing their order. This is done for convenience, and it is possible to tell the truth and still satisfy the definition of a truncation strategy. To investigate strategic behavior in a centralized match clearing center based on the Gale-Shapley deferred acceptance algorithm, a laboratory experiment was conducted to test how often agents strategically misrepresent their preferences by sending a “truncation of their true preferences”. The experimental design used a constrained environment in which a particular form of truncation was always the best answer.
The results showed that subjects don't truncate their preferences more often when it is cost-effective. However, they truncate their preferences less often when there is a risk of “truncating too much” and remaining unparalleled. This suggests that behavioral knowledge can play an important role in the field of market design. The other strategies are only_first and only_second, which refer to whether truncation should be applied exclusively to the first or second set of inputs.
For example, if the model does not have a specific maximum input length, truncation or padding to max_length is disabled. TF-IDF, the truncation process would maintain the 512 core TF-IDF score tokens or only the first 512 tokens. When you set the truncation strategy to longest_first, the tokenizer will compare the length of text and text_pair each time a token needs to be deleted and will remove a token from the longer one. Filling and truncation are strategies to deal with this problem, to create rectangular tensioners from batches of different lengths.
In case only_second doesn't meet your requirements, you can simply create your own truncation strategy. In most cases, filling the batch to the longest sequence length and truncating it to the maximum length that a model can accept works quite well. If you need to truncate an input stream simply in a cut off manner, none of the existing strategies will work. Instead, you can create your own truncation strategy that removes tokens from the right or end of the sequence instead of comparing lengths between text and text_pair.