GitHub - quanteda/quanteda.tidy: Tidyverse extensions for quanteda (original) (raw)

quanteda.tidy

CRAN_Status_Badge Lifecycle: experimental R-CMD-check Codecov test coverage

About

quanteda.tidy extends the quanteda package with functionality from the “tidyverse”, especially dplyr.
Note that this is not the same as tidytext, which stretches tokens into data.frames. Instead, tidy functions operate only on document variables, but extends these functions (from dplyr) to work onquanteda objects as if they were tibbles or data.frames.

Installation

You can install quanteda.tidy from GitHub with:

devtools::install_github("quanteda/quanteda.tidy")

Examples

Adding a document variable for full president name:

library("quanteda.tidy", warn.conflicts = FALSE)

Loading required package: quanteda

Package version: 4.2.1

Unicode version: 14.0

ICU version: 71.1

Parallel computing: 10 of 10 threads used.

See https://quanteda.io for tutorials and examples.

data_corpus_inaugural %>% transmute(fullname = paste(FirstName, President, sep = ", ")) %>% summary(n = 5)

Corpus consisting of 59 documents, showing 5 documents:

Text Types Tokens Sentences fullname

1789-Washington 625 1537 23 George, Washington

1793-Washington 96 147 4 George, Washington

1797-Adams 826 2577 37 John, Adams

1801-Jefferson 717 1923 41 Thomas, Jefferson

1805-Jefferson 804 2380 45 Thomas, Jefferson

data_corpus_inaugural %>% mutate(fullname = paste(FirstName, President, sep = ", ")) %>% summary(n = 5)

Corpus consisting of 59 documents, showing 5 documents:

Text Types Tokens Sentences Year President FirstName

1789-Washington 625 1537 23 1789 Washington George

1793-Washington 96 147 4 1793 Washington George

1797-Adams 826 2577 37 1797 Adams John

1801-Jefferson 717 1923 41 1801 Jefferson Thomas

1805-Jefferson 804 2380 45 1805 Jefferson Thomas

Party fullname

none George, Washington

none George, Washington

Federalist John, Adams

Democratic-Republican Thomas, Jefferson

Democratic-Republican Thomas, Jefferson

Filtering documents based on years:

data_corpus_inaugural %>% filter(President == "Roosevelt") %>% summary()

Corpus consisting of 5 documents, showing 5 documents:

Text Types Tokens Sentences Year President FirstName Party

1905-Roosevelt 404 1079 33 1905 Roosevelt Theodore Republican

1933-Roosevelt 743 2057 85 1933 Roosevelt Franklin D. Democratic

1937-Roosevelt 725 1989 96 1937 Roosevelt Franklin D. Democratic

1941-Roosevelt 526 1519 68 1941 Roosevelt Franklin D. Democratic

1945-Roosevelt 275 633 27 1945 Roosevelt Franklin D. Democratic

Renaming document variables:

data_corpus_inaugural %>% rename(LastName = President) %>% select(FirstName, LastName) %>% summary(n = 5)

Corpus consisting of 59 documents, showing 5 documents:

Text Types Tokens Sentences FirstName LastName

1789-Washington 625 1537 23 George Washington

1793-Washington 96 147 4 George Washington

1797-Adams 826 2577 37 John Adams

1801-Jefferson 717 1923 41 Thomas Jefferson

1805-Jefferson 804 2380 45 Thomas Jefferson

Glimpse (from tibble):

glimpse(data_corpus_inaugural)

Rows: 59

Columns: 6

$ doc_id "1789-Washington", "1793-Washington", "1797-Adams", "1801-Je…

$ text "Fellow-Cit…", "Fellow cit…", "When it wa…", "Friends an…", …

$ Year 1789, 1793, 1797, 1801, 1805, 1809, 1813, 1817, 1821, 1825, …

$ President "Washington", "Washington", "Adams", "Jefferson", "Jefferson…

$ FirstName "George", "George", "John", "Thomas", "Thomas", "James", "Ja…

$ Party none, none, Federalist, Democratic-Republican, Democratic-Re…

Slice operations:

slice(data_corpus_inaugural, 1:3)

Corpus consisting of 3 documents and 4 docvars.

1789-Washington :

"Fellow-Citizens of the Senate and of the House of Representa..."

1793-Washington :

"Fellow citizens, I am again called upon by the voice of my c..."

1797-Adams :

"When it was first perceived, in early times, that no middle ..."

slice_head(data_corpus_inaugural, prop = .10)

Corpus consisting of 5 documents and 4 docvars.

1789-Washington :

"Fellow-Citizens of the Senate and of the House of Representa..."

1793-Washington :

"Fellow citizens, I am again called upon by the voice of my c..."

1797-Adams :

"When it was first perceived, in early times, that no middle ..."

1801-Jefferson :

"Friends and Fellow Citizens: Called upon to undertake the du..."

1805-Jefferson :

"Proceeding, fellow citizens, to that qualification which the..."

slice_tail(data_corpus_inaugural, n = 3)

Corpus consisting of 3 documents and 4 docvars.

2013-Obama :

"Vice President Biden, Mr. Chief Justice, Members of the Unit..."

2017-Trump :

"Chief Justice Roberts, President Carter, President Clinton, ..."

2021-Biden :

"Chief Justice Roberts, Vice President Harris, Speaker Pelosi..."

set.seed(42) slice_sample(data_corpus_inaugural, prop = .50)

Corpus consisting of 29 documents and 4 docvars.

1981-Reagan :

"Senator Hatfield, Mr. Chief Justice, Mr. President, Vice Pre..."

1933-Roosevelt :

"I am certain that my fellow Americans expect that on my indu..."

1789-Washington :

"Fellow-Citizens of the Senate and of the House of Representa..."

1885-Cleveland :

"Fellow citizens, in the presence of this vast assemblage of ..."

1825-Adams :

"In compliance with an usage coeval with the existence of our..."

1929-Hoover :

"My Countrymen: This occasion is not alone the administration..."

[ reached max_ndoc ... 23 more documents ]