| Title: | Structures for Preference Data |
|---|---|
| Description: | Convenient structures for creating, sourcing, reading, writing and manipulating ordinal preference data. Methods for writing to/from PrefLib formats. See Nicholas Mattei and Toby Walsh "PrefLib: A Library of Preference Data" (2013) <doi:10.1007/978-3-642-41575-3_20>. |
| Authors: | Floyd Everest [aut, cre] (ORCID: <https://orcid.org/0000-0002-2726-6736>), Heather Turner [aut] (ORCID: <https://orcid.org/0000-0002-1256-3375>), Damjan Vukcevic [aut] (ORCID: <https://orcid.org/0000-0001-7780-9586>) |
| Maintainer: | Floyd Everest <[email protected]> |
| License: | GPL-3 |
| Version: | 0.2.0 |
| Built: | 2026-05-27 09:04:21 UTC |
| Source: | https://github.com/fleverest/prefio |
Convert a set of preferences to an adjacency matrix summarising wins and losses between pairs of items.
adjacency(x, preferences_col = NULL, frequency_col = NULL, ...)adjacency(x, preferences_col = NULL, frequency_col = NULL, ...)
x |
A |
preferences_col |
< |
frequency_col |
< |
... |
Currently unused. |
For a preferences object with items, the adjacency
matrix is an by matrix, with element being the
number of times item wins over item . For example, in the
preferences {1} > {3, 4} > {2}, item 1 wins over items 2, 3, and 4,
while items 3 and 4 win over item 2.
If weights is specified, the values in the adjacency matrix are the
weighted counts.
An by matrix, where is the number of
items.
x <- tibble::tribble( ~voter_id, ~species, ~food, ~ranking, 1, "Rabbit", "Apple", 1, 1, "Rabbit", "Banana", 2, 1, "Rabbit", "Carrot", 3, 2, "Monkey", "Banana", 1, 2, "Monkey", "Apple", 2, 2, "Monkey", "Carrot", 3 ) |> long_preferences( food_preference, id_cols = voter_id, item_col = food, rank_col = ranking ) |> dplyr::pull(food_preference) |> adjacency()x <- tibble::tribble( ~voter_id, ~species, ~food, ~ranking, 1, "Rabbit", "Apple", 1, 1, "Rabbit", "Banana", 2, 1, "Rabbit", "Carrot", 3, 2, "Monkey", "Banana", 1, 2, "Monkey", "Apple", 2, 2, "Monkey", "Carrot", 3 ) |> long_preferences( food_preference, id_cols = voter_id, item_col = food, rank_col = ranking ) |> dplyr::pull(food_preference) |> adjacency()
Complete preferences by adding unselected items as last place occurrances.
pref_add_unranked(x)pref_add_unranked(x)
x |
A vector of |
A new vector of preferences, with each selection starting with the
corresponding selections made in x, but with all unranked items placed
last.
# Complete partial rankings by adding unranked items last pref_add_unranked(preferences(c("a > b", "c > a", "b")))# Complete partial rankings by adding unranked items last pref_add_unranked(preferences(c("a > b", "c > a", "b")))
Check if a preference is blank.
pref_blank(x)pref_blank(x)
x |
A vector of |
A logical vector indicating which preferences are blank, i.e., [].
pref_blank(preferences(c("a > b > c", "", "b > c")))pref_blank(preferences(c("a > b > c", "", "b > c")))
Covariance matrix for preferences, calculated using the rankings matrix.
pref_cov(x, preferences_col = NULL, frequency_col = NULL, ...)pref_cov(x, preferences_col = NULL, frequency_col = NULL, ...)
x |
A vector of preferences, or a tibble with a column of preferences. |
preferences_col |
< |
frequency_col |
< |
... |
Extra arguments to be passed to |
A covariance matrix containing covariances for the ranks assigned to item pairs.
# Simple covariance on a vector of preferences prefs <- preferences(c("a > b > c", "b > c > a", "c > a > b")) pref_cov(prefs) # Weighted covariance by frequency df <- tibble::tibble( prefs = preferences(c("a > b > c", "b > c > a")), freq = c(3, 2) ) pref_cov(df, preferences_col = prefs, frequency_col = freq)# Simple covariance on a vector of preferences prefs <- preferences(c("a > b > c", "b > c > a", "c > a > b")) pref_cov(prefs) # Weighted covariance by frequency df <- tibble::tibble( prefs = preferences(c("a > b > c", "b > c > a")), freq = c(3, 2) ) pref_cov(df, preferences_col = prefs, frequency_col = freq)
Get the name of the item(s) assigned a specific rank, e.g., first.
pref_get_items(x, rank, drop = FALSE)pref_get_items(x, rank, drop = FALSE)
x |
A vector of |
rank |
A single integer, the rank which you would like to inspect. |
drop |
When |
A list containing the name(s) of the item(s) ranked rank in each of
the preferences in x.
# Get items ranked first pref_get_items(preferences(c("a > b > c", "b = c > a")), rank = 1) # Get items ranked second pref_get_items(preferences(c("a > b > c", "b = c > a")), rank = 2) # Get items ranked first, dropping blank preferences pref_get_items(preferences(c("a > b > c", "", "b = c > a")), rank = 1, drop = TRUE)# Get items ranked first pref_get_items(preferences(c("a > b > c", "b = c > a")), rank = 1) # Get items ranked second pref_get_items(preferences(c("a > b > c", "b = c > a")), rank = 2) # Get items ranked first, dropping blank preferences pref_get_items(preferences(c("a > b > c", "", "b = c > a")), rank = 1, drop = TRUE)
Get the rank assigned to a specific item in a set of preferences.
pref_get_rank(x, item_name)pref_get_rank(x, item_name)
x |
A vector of |
item_name |
The name of the item to extract the rank for. |
The rank of item_name for each of the preferences in x.
pref_get_rank(preferences(c("a > b > c", "b > c = a", "")), "a")pref_get_rank(preferences(c("a > b > c", "b > c = a", "")), "a")
A very rudimentary implementation of the IRV counting algorithm. It does not handle ties elegantly, and should only be used for demonstration purposes. This implementation eliminates all candidates with the fewest first-choice votes in each round until one candidate has a majority or fewer than two candidates remain.
pref_irv(x, preferences_col = NULL, frequency_col = NULL)pref_irv(x, preferences_col = NULL, frequency_col = NULL)
x |
A vector of preferences, or a tibble with a column of preferences. |
preferences_col |
< |
frequency_col |
< |
A list containing:
The winning candidate(s) after IRV counting
A list of tibbles, each containing vote tallies for each round
Character vector of eliminated candidates in order
# Multi-round election with four candidates prefs <- preferences(c( "alice > bob > charlie > david", "alice > bob > charlie > david", "alice > charlie > bob > david", "bob > alice > charlie > david", "bob > charlie > alice > david", "bob > charlie > alice > david", "charlie > david > alice > bob", "charlie > david > bob > alice", "david > charlie > bob > alice", "david > charlie > bob > alice" )) result <- pref_irv(prefs) result$winner # Final winner after elimination rounds result$rounds # Vote tallies for each round # Using aggregated data frame df <- tibble::tibble( prefs = preferences(c( "alice > bob > charlie > david", "alice > charlie > bob > david", "bob > alice > charlie > david", "bob > charlie > alice > david", "charlie > david > alice > bob", "charlie > david > bob > alice", "david > charlie > bob > alice" )), freq = c(2, 1, 1, 2, 1, 1, 2) ) pref_irv(df, prefs, freq)# Multi-round election with four candidates prefs <- preferences(c( "alice > bob > charlie > david", "alice > bob > charlie > david", "alice > charlie > bob > david", "bob > alice > charlie > david", "bob > charlie > alice > david", "bob > charlie > alice > david", "charlie > david > alice > bob", "charlie > david > bob > alice", "david > charlie > bob > alice", "david > charlie > bob > alice" )) result <- pref_irv(prefs) result$winner # Final winner after elimination rounds result$rounds # Vote tallies for each round # Using aggregated data frame df <- tibble::tibble( prefs = preferences(c( "alice > bob > charlie > david", "alice > charlie > bob > david", "bob > alice > charlie > david", "bob > charlie > alice > david", "charlie > david > alice > bob", "charlie > david > bob > alice", "david > charlie > bob > alice" )), freq = c(2, 1, 1, 2, 1, 1, 2) ) pref_irv(df, prefs, freq)
Keep only specified items from preferences.
pref_keep(x, items)pref_keep(x, items)
x |
A vector of |
items |
The names of the items which should be kept for preferences in |
A new vector of preferences, but only containing items from each selection.
# Keep only 'a' and 'c' pref_keep(preferences(c("a > b > c", "b > c > a")), c("a", "c"))# Keep only 'a' and 'c' pref_keep(preferences(c("a > b > c", "b > c > a")), c("a", "c"))
Check the length (number of rankings) of a preference.
pref_length(x)pref_length(x)
x |
A vector of |
The number of items listed on each of the preferences.
pref_length(preferences(c("a > b > c", "", "b > c")))pref_length(preferences(c("a > b > c", "", "b > c")))
Remove specified items from preferences.
pref_omit(x, items)pref_omit(x, items)
x |
A vector of |
items |
The names of the items which should be removed from the preferences in |
A new vector of preferences, but with items removed from each selection.
# Remove 'b' pref_omit(preferences(c("a > b > c", "b > c > a")), "b") # Remove 'b' and 'd' pref_omit(preferences(c("a > b > c > d", "b > c > a > d")), c("b", "d"))# Remove 'b' pref_omit(preferences(c("a > b > c", "b > c > a")), "b") # Remove 'b' and 'd' pref_omit(preferences(c("a > b > c > d", "b > c > a > d")), c("b", "d"))
Eliminate lowest (or highest) ranked items from preferences.
pref_pop(x, n = 1L, lowest = TRUE, drop = FALSE)pref_pop(x, n = 1L, lowest = TRUE, drop = FALSE)
x |
A vector of |
n |
The number of times to remove the bottom rank. |
lowest |
If |
drop |
If |
A new vector of preferences which is equal to x but with the least
preferred selection dropped for each selection.
# Remove the lowest ranked item from each preference pref_pop(preferences(c("a > b > c", "b > c > a"))) # Remove the 2 lowest ranked items pref_pop(preferences(c("a > b > c > d", "b > c > a > d")), n = 2) # Remove the highest ranked item instead pref_pop(preferences(c("a > b > c", "b > c > a")), lowest = FALSE) # Remove blank preferences that result from popping pref_pop(preferences(c("a > b", "c", "")), drop = TRUE)# Remove the lowest ranked item from each preference pref_pop(preferences(c("a > b > c", "b > c > a"))) # Remove the 2 lowest ranked items pref_pop(preferences(c("a > b > c > d", "b > c > a > d")), n = 2) # Remove the highest ranked item instead pref_pop(preferences(c("a > b > c", "b > c > a")), lowest = FALSE) # Remove blank preferences that result from popping pref_pop(preferences(c("a > b", "c", "")), drop = TRUE)
Reverse preference rankings
pref_rev(x, ...)pref_rev(x, ...)
x |
A vector of |
... |
Not used. |
A vector of preferences with rankings reversed (first becomes last, etc.)
pref_rev(preferences(c("a > b > c", "b > c > a")))pref_rev(preferences(c("a > b > c", "b > c > a")))
Truncate preferences to a maximum number of ranks.
pref_trunc(x, n = 1L, bottom = FALSE)pref_trunc(x, n = 1L, bottom = FALSE)
x |
A vector of |
n |
The maximum number of ranks to include (positive) or number of ranks to drop (negative). Must be an integer. |
bottom |
If |
A vector of preferences with each selection truncated according to the parameters.
# Keep only the top 2 ranks pref_trunc(preferences(c("a > b > c > d", "b > c > a")), n = 2) # Keep only the bottom 2 ranks pref_trunc(preferences(c("a > b > c > d", "b > c > a")), n = 2, bottom = TRUE) # Drop the bottom 2 ranks (keep top ranks) pref_trunc(preferences(c("a > b > c > d", "b > c > a")), n = -2) # Drop the top 2 ranks (keep bottom ranks) pref_trunc(preferences(c("a > b > c > d", "b > c > a")), n = -2, bottom = TRUE)# Keep only the top 2 ranks pref_trunc(preferences(c("a > b > c > d", "b > c > a")), n = 2) # Keep only the bottom 2 ranks pref_trunc(preferences(c("a > b > c > d", "b > c > a")), n = 2, bottom = TRUE) # Drop the bottom 2 ranks (keep top ranks) pref_trunc(preferences(c("a > b > c > d", "b > c > a")), n = -2) # Drop the top 2 ranks (keep bottom ranks) pref_trunc(preferences(c("a > b > c > d", "b > c > a")), n = -2, bottom = TRUE)
Ordinal preferences can order every item, or they can order a subset. Some ordinal preference datasets will contain ties between items at a given rank. Hence, there are four distinct types of preferential data:
socStrict Orders - Complete List
soiStrict Orders - Incomplete List
tocOrders with Ties - Complete List
toiOrders with Ties - Incomplete List
pref_type(x, n_items = NULL)pref_type(x, n_items = NULL)
x |
A |
n_items |
The number of items, needed to assess whether a selection is
complete or not. Defaults to |
One of c("soc", "soi", "toc", "toi"), indicating the type of
preferences in x (with or without ties / complete or incomplete rankings).
A tidy interface for working with ordinal preferences.
long_preferences( data, col, id_cols = NULL, rank_col = NULL, item_col = NULL, item_names = NULL, verbose = TRUE, unused_fn = NULL, na_action = c("drop_rows", "drop_preferences"), ... ) wide_preferences( data, col = NULL, ranking_cols = NULL, verbose = TRUE, na_action = c("keep_as_partial", "drop_preferences"), ... ) as_preferences(strings, sep = ">", equality = "=", descending = TRUE) preferences( strings = character(0L), sep = ">", equality = "=", descending = TRUE ) ## S3 method for class 'preferences' format(x, ...) ## S3 method for class 'preferences' levels(x, ...)long_preferences( data, col, id_cols = NULL, rank_col = NULL, item_col = NULL, item_names = NULL, verbose = TRUE, unused_fn = NULL, na_action = c("drop_rows", "drop_preferences"), ... ) wide_preferences( data, col = NULL, ranking_cols = NULL, verbose = TRUE, na_action = c("keep_as_partial", "drop_preferences"), ... ) as_preferences(strings, sep = ">", equality = "=", descending = TRUE) preferences( strings = character(0L), sep = ">", equality = "=", descending = TRUE ) ## S3 method for class 'preferences' format(x, ...) ## S3 method for class 'preferences' levels(x, ...)
data |
A |
col |
The name of the new column, as a string or symbol. |
id_cols |
< |
rank_col |
< |
item_col |
< |
item_names |
The names of the full set of items. This is necessary when the dataset specifies items by index rather than by name, or when there are items which do not appear in any preference selection. |
verbose |
If |
unused_fn |
When |
na_action |
Specifies how to handle NA values.
|
... |
Unused. |
ranking_cols |
< |
strings |
A character vector of preference strings |
sep |
Character separating the items in the string (default: ">") |
equality |
Character representing equality between items (default: "=") |
descending |
If TRUE, parse as descending order preferences. |
x |
A vector of preferences. |
format |
The format of the data: one of "ordering", "ranking", or
"long" (see above). By default, |
A preferences object, or a modified tibble with a column of preferences when data is
a data.frame or tibble.
# Votes cast by two animals ranking a variety of fruits and vegetables. # This is not real data, I made this up. x <- tibble::tribble( ~voter_id, ~species, ~food, ~ranking, 1, "Rabbit", "Apple", 1, 1, "Rabbit", "Carrot", 2, 1, "Rabbit", "Banana", 3, 2, "Monkey", "Banana", 1, 2, "Monkey", "Apple", 2, 2, "Monkey", "Carrot", 3 ) # Process preferencial data into a single column. x |> long_preferences( food_preference, id_cols = voter_id, item_col = food, rank_col = ranking ) # The same, but keep the species data. x |> long_preferences( food_preference, id_cols = voter_id, item_col = food, rank_col = ranking, unused_fn = list(species = dplyr::first) )# Votes cast by two animals ranking a variety of fruits and vegetables. # This is not real data, I made this up. x <- tibble::tribble( ~voter_id, ~species, ~food, ~ranking, 1, "Rabbit", "Apple", 1, 1, "Rabbit", "Carrot", 2, 1, "Rabbit", "Banana", 3, 2, "Monkey", "Banana", 1, 2, "Monkey", "Apple", 2, 2, "Monkey", "Carrot", 3 ) # Process preferencial data into a single column. x |> long_preferences( food_preference, id_cols = voter_id, item_col = food, rank_col = ranking ) # The same, but keep the species data. x |> long_preferences( food_preference, id_cols = voter_id, item_col = food, rank_col = ranking, unused_fn = list(species = dplyr::first) )
Convert a set of preferences to a rankings matrix, where each preference defines a single row in the output. The columns in the rankings matrix give the vector or ranks assigned to the corresponding candidate.
ranking_matrix(x, preferences_col = NULL, frequency_col = NULL, ...)ranking_matrix(x, preferences_col = NULL, frequency_col = NULL, ...)
x |
A |
preferences_col |
< |
frequency_col |
< |
... |
Currently unused. |
For a preferences vector of length with items, the rankings
matrix is an by matrix, with element being the
rank assigned to candidate in the th selection.
An by matrix, where is the number of
preferences, and is the number of items.
x <- tibble::tribble( ~voter_id, ~species, ~food, ~ranking, 1, "Rabbit", "Apple", 1, 1, "Rabbit", "Banana", 2, 1, "Rabbit", "Carrot", 3, 2, "Monkey", "Banana", 1, 2, "Monkey", "Apple", 2, 2, "Monkey", "Carrot", 3 ) |> long_preferences( food_preference, id_cols = voter_id, item_col = food, rank_col = ranking ) |> dplyr::pull(food_preference) |> ranking_matrix()x <- tibble::tribble( ~voter_id, ~species, ~food, ~ranking, 1, "Rabbit", "Apple", 1, 1, "Rabbit", "Banana", 2, 1, "Rabbit", "Carrot", 3, 2, "Monkey", "Banana", 1, 2, "Monkey", "Apple", 2, 2, "Monkey", "Carrot", 3 ) |> long_preferences( food_preference, id_cols = voter_id, item_col = food, rank_col = ranking ) |> dplyr::pull(food_preference) |> ranking_matrix()
Read orderings from .soc, .soi, .toc or .toi files storing
ordinal preference data format as defined by
{PrefLib}: A Library for Preferences
into a preferences object.
read_preflib( file, from_preflib = FALSE, preflib_url = "https://raw.githubusercontent.com/PrefLib/PrefLib-Data/main/datasets/" )read_preflib( file, from_preflib = FALSE, preflib_url = "https://raw.githubusercontent.com/PrefLib/PrefLib-Data/main/datasets/" )
file |
A preferential data file, conventionally with extension |
from_preflib |
A logical which, when |
preflib_url |
The URL which will be preprended to |
Note that PrefLib refers to the items being ordered by "alternatives".
The file types supported are
Strict Orders - Complete List
Strict Orders - Incomplete List
Orders with Ties - Complete List
Orders with Ties - Incomplete List
The numerically coded orderings and their frequencies are read into a tibble, storing all original metadata in a "preflib" attribute.
A PrefLib file may be corrupt, in the sense that the ordered alternatives do not match their names. In this case, the file will still be read, but with a warning.
A tibble with two columns: preferences and frequency. The
preferences column contains all the preferential orderings in the file, and
the frequency column the relative frequency of this selection.
The Netflix and cities datasets used in the examples are from Caragiannis et al (2017) and Bennet and Lanning (2007) respectively. These data sets require a citation for re-use.
Mattei, N. and Walsh, T. (2013) PrefLib: A Library of Preference Data. Proceedings of Third International Conference on Algorithmic Decision Theory (ADT 2013). Lecture Notes in Artificial Intelligence, Springer.
Bennett, J. and Lanning, S. (2007) The Netflix Prize. Proceedings of The KDD Cup and Workshops.
# Can take a little while depending on speed of internet connection # strict complete orderings of four films on Netflix netflix <- read_preflib("00004 - netflix/00004-00000138.soc", from_preflib = TRUE) head(netflix) levels(netflix$preferences) # strict incomplete orderings of 6 random cities from 36 in total cities <- read_preflib("00034 - cities/00034-00000001.soi", from_preflib = TRUE)# Can take a little while depending on speed of internet connection # strict complete orderings of four films on Netflix netflix <- read_preflib("00004 - netflix/00004-00000138.soc", from_preflib = TRUE) head(netflix) levels(netflix$preferences) # strict incomplete orderings of 6 random cities from 36 in total cities <- read_preflib("00034 - cities/00034-00000001.soi", from_preflib = TRUE)
Write preferences to .soc, .soi, .toc or .toi file types, as
defined by the PrefLib specification:
{PrefLib}: A Library for Preferences.
write_preflib( x, file = "", preferences_col = NULL, frequency_col = NULL, title = NULL, publication_date = NULL, modification_type = NULL, modification_date = NULL, description = NULL, relates_to = NULL, related_files = NULL )write_preflib( x, file = "", preferences_col = NULL, frequency_col = NULL, title = NULL, publication_date = NULL, modification_type = NULL, modification_date = NULL, description = NULL, relates_to = NULL, related_files = NULL )
x |
A |
file |
Either a character string naming the a file or a writeable,
open connection. The empty string |
preferences_col |
< |
frequency_col |
< |
title |
The title of the data file, for instance the name of the
election represented in the data file. If not provided, we check for the
presence of |
publication_date |
The date at which the data file was published for the
first time. If not provided, we check for the presence of
|
modification_type |
The modification type of the data: one of
|
modification_date |
The last time the data was modified. If not
provided, we check for the presence of |
description |
A description of the data file, providing additional
information about it. If not provided, we check for the presence of
|
relates_to |
The name of the data file that the current file relates to,
typically the source file in case the current file has been derived from
another one. If not provided, we check for the presence of
|
related_files |
The list of all the data files related to this one,
comma separated. If not provided, we check for the presence of
|
The file types supported are
Strict Orders - Complete List
Strict Orders - Incomplete List
Orders with Ties - Complete List
Orders with Ties - Incomplete List
The PrefLib format specification requires some additional metadata. Note
that the additional metadata required for the PrefLib specification is not
necessarily required for the write_preflib method; any missing fields
required by the PrefLib format will simply show "NA".
The title of the data file, for instance the year of the election represented in the data file.
A description of the data file, providing additional information about it.
The name of the data file that the current file relates to, typically the source file in case the current file has been derived from another one.
The list of all the data files related to this one, comma separated.
The date at which the data file was published for the first time.
The modification type of the data. One of:
Data that has only been converted into a PrefLib format.
Data that has been induced from another context. For example, computing a pairwise relation from a set of strict total orders. No assumptions have been made to create these files, just a change in the expression language.
Data that has been imbued with extra information. For example, extending an incomplete partial order by placing all unranked candidates tied at the end.
Data that has been generated artificially.
The last time the data was modified.
In addition to these fields, some required PrefLib fields will be generated
automatically depending on arguments to write_preflib() and the attributes
of the aggregated_preferences object being written to file:
The name of the output file.
The data type (one of soc, soi, toc or toi).
The number of items.
X
The name of each item, where X ranges from
0 to length(items).
The total number of orderings.
The number of distinct orderings.
Note that PrefLib refers to the items as "alternatives".
The "alternatives" in the output file will be the same as the "items" in the
aggregated_preferences object.
No return value. Output will be written to file or stdout.