easyPubMed: custom_grep – R documentation

Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!

custom_grep

Retrieve Text Between XML Tags

Description

Extract text form a string containing XML or HTML tags. Text included between tags of interest will be returned. If multiple tagged substrings are found, they will be returned as different elements of a list or character vector.

Usage

custom_grep(xml_data, tag, format = "list")

Arguments

`xml_data`	String (of class character and length 1): corresponds to the PubMed record or any string including XML/HTML tags.
`tag`	String (of class character and length 1): the tag of interest (does NOT include < > chars).
`format`	c("list", "char"): specifies the format for the output.

Details

The input string has to be a character string (length 1) containing tags (HTML or XML format). If an XML Document is provided as input, the function will rise an error.

Value

List or vector where each element corresponds to an in-tag substring.

Author(s)

Damiano Fantini damiano.fantini@gmail.com

References

https://www.data-pulse.com/dev_site/easypubmed/

Examples

try({
  ## extract substrings based on regular expressions
  string_01 <- "I can't wait to watch the <strong>Late Night Show with" 
  string_01 <- paste(string_01, "Seth Meyers</strong> tonight at <strong>11:30</strong>pm CT!")
  print(string_01)
  custom_grep(xml_data = string_01, tag = "strong", format = "char")
  custom_grep(xml_data = string_01, tag = "strong", format = "list")
}, silent = TRUE)

easyPubMed

Search and Retrieve Scientific Publication Records from PubMed

v2.13

GPL-2

Authors

Damiano Fantini

Initial release

2019-03-25