Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!
Get Started for Free

get_nexis_html

extract texts and meta data from Nexis HTML files


Description

This extract headings, body texts and meta data (date, byline, length, section, edition) from items in HTML files downloaded by the scraper.

Usage

get_nexis_html(path, paragraph_separator = "\n\n", verbosity, ...)

Arguments

path

either path to a HTML file or a directory that contains HTML files

paragraph_separator

a character to separate paragraphs in body texts

verbosity
  • 0: output errors only

  • 1: output errors and warnings (default)

  • 2: output a brief summary message

  • 3: output detailed file-related messages

...

only to trap extra arguments

Examples

## Not run: 
irt <- readtext:::get_nexis_html('tests/data/nexis/irish-times_1995-06-12_0001.html')
afp <- readtext:::get_nexis_html('tests/data/nexis/afp_2013-03-12_0501.html')
gur <- readtext:::get_nexis_html('tests/data/nexis/guardian_1986-01-01_0001.html')
sun <- readtext:::get_nexis_html('tests/data/nexis/sun_2000-11-01_0001.html')
spg <- readtext:::get_nexis_html('tests/data/nexis/spiegel_2012-02-01_0001.html', 
                                  language_date = 'german')

all <- readtext('tests/data/nexis', source = 'nexis')
all <- readtext('tests/data/nexis', source = 'nexis')

## End(Not run)

readtext

Import and Handling for Plain and Formatted Text Files

v0.80
GPL-3
Authors
Kenneth Benoit [aut, cre, cph], Adam Obeng [aut], Kohei Watanabe [ctb], Akitaka Matsuo [ctb], Paul Nulty [ctb], Stefan Müller [ctb]
Initial release

We don't support your browser anymore

Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.