Title: | Separate a Data Frame by Normalization |
---|---|
Description: | Separate a data frame in two based on key columns. The function unjoin() provides an inside-out version of a nested data frame. This is used to identify duplication and normalize it (in the database sense) by linking two tables with the redundancy removed. This is a basic requirement for detecting topology within spatial structures that has motivated the need for this package as a building block for workflows within more applied projects. |
Authors: | Michael D. Sumner [aut, cre], Simon Wotherspoon [ctb], Hadley Wickham [ctb] (named the concept, provided excellent guidance via tidyr source code) |
Maintainer: | Michael D. Sumner <[email protected]> |
License: | GPL-3 |
Version: | 0.1.0 |
Built: | 2024-12-28 03:08:44 UTC |
Source: | https://github.com/hypertidy/unjoin |
Split a table in two and remove repeated values.
unjoin(data, ..., key_col = "idx0") ## S3 method for class 'data.frame' unjoin(data, ..., key_col = ".idx0") ## S3 method for class 'unjoin' unjoin(data, ..., key_col = ".idx0")
unjoin(data, ..., key_col = "idx0") ## S3 method for class 'data.frame' unjoin(data, ..., key_col = ".idx0") ## S3 method for class 'unjoin' unjoin(data, ..., key_col = ".idx0")
data |
A data frame. |
... |
Specification of columns to unjoin by. For full details, see the 'dplyr::select“ documentation. |
key_col |
The name of the new column to key the two output data frames. |
The data frame on input is treated as "data", the new data frame is treated as the normalized key. This means that the split-off and de-duplicated table has the name given via the 'key_col' argument (defaults to ".idx0") and shares this name with the common key.
It's not yet clear if this flexibility around naming is a good idea, but it enables a simple scheme for chaining unjoins, though you'd better not use the same 'key_col' again.
This is a subset of the tasks done by nest
.
'dplyr::inner_join' for the inverse operation.
'tidyr::nest' for the complementary operation resulting in one nested data frame
library(dplyr) data("Seatbelts", package= "datasets") x <- unjoin(as.data.frame(Seatbelts), front, law) y <- inner_join(x$.idx0, x$data) %>% select(-.idx0) all.equal(y[colnames(Seatbelts)], as.data.frame(Seatbelts)) iris %>% unjoin(-Species) chickwts %>% unjoin(weight) if (require("gapminder")) { gapminder %>% group_by(country, continent) %>% unjoin() gapminder %>% unjoin(-country, -continent) unjoin(gapminder) } unjoin(iris, Petal.Width) %>% unjoin(Species, key_col = ".idx1")
library(dplyr) data("Seatbelts", package= "datasets") x <- unjoin(as.data.frame(Seatbelts), front, law) y <- inner_join(x$.idx0, x$data) %>% select(-.idx0) all.equal(y[colnames(Seatbelts)], as.data.frame(Seatbelts)) iris %>% unjoin(-Species) chickwts %>% unjoin(weight) if (require("gapminder")) { gapminder %>% group_by(country, continent) %>% unjoin() gapminder %>% unjoin(-country, -continent) unjoin(gapminder) } unjoin(iris, Petal.Width) %>% unjoin(Species, key_col = ".idx1")