Term extraction tool from textual fields of a manuscript
It extracts terms from a text field (abstract, title, author's keywords, etc.) of a bibliographic data frame.
termExtraction( M, Field = "TI", ngrams = 1, stemming = FALSE, language = "english", remove.numbers = TRUE, remove.terms = NULL, keep.terms = NULL, synonyms = NULL, verbose = TRUE )
M |
is a data frame obtained by the converting function |
||||||||||||
Field |
is a character object. It indicates the field tag of textual data :
The default is |
||||||||||||
ngrams |
is an integer between 1 and 3. It indicates the type of n-gram to extract from texts.
An n-gram is a contiguous sequence of n terms. The function can extract n-grams composed by 1, 2, 3 or 4 terms. Default value is |
||||||||||||
stemming |
is logical. If TRUE the Porter Stemming algorithm is applied to all extracted terms. The default is |
||||||||||||
language |
is a character. It is the language of textual contents ("english", "german","italian","french","spanish"). The default is |
||||||||||||
remove.numbers |
is logical. If TRUE all numbers are deleted from the documents before term extraction. The default is |
||||||||||||
remove.terms |
is a character vector. It contains a list of additional terms to delete from the documents before term extraction. The default is |
||||||||||||
keep.terms |
is a character vector. It contains a list of compound words "formed by two or more terms" to keep in their original form in the term extraction process. The default is |
||||||||||||
synonyms |
is a character vector. Each element contains a list of synonyms, separated by ";", that will be merged into a single term (the first word contained in the vector element). The default is |
||||||||||||
verbose |
is logical. If TRUE the function prints the most frequent terms extracted from documents. The default is |
the bibliometric data frame with a new column containing terms about the field tag indicated in the argument Field
.
convert2df
to import and convert an WoS or SCOPUS Export file in a bibliographic data frame.
biblioAnalysis
function for bibliometric analysis
# Example 1: Term extraction from titles data(scientometrics, package = "bibliometrixData") # vector of compound words keep.terms <- c("co-citation analysis","bibliographic coupling") # term extraction scientometrics <- termExtraction(scientometrics, Field = "TI", ngrams = 1, remove.numbers=TRUE, remove.terms=NULL, keep.terms=keep.terms, verbose=TRUE) # terms extracted from the first 10 titles scientometrics$TI_TM[1:10] #Example 2: Term extraction from abstracts data(scientometrics) # term extraction scientometrics <- termExtraction(scientometrics, Field = "AB", ngrams = 2, stemming=TRUE,language="english", remove.numbers=TRUE, remove.terms=NULL, keep.terms=NULL, verbose=TRUE) # terms extracted from the first abstract scientometrics$AB_TM[1] # Example 3: Term extraction from keywords with synonyms data(scientometrics) # vector of synonyms synonyms <- c("citation; citation analysis", "h-index; index; impact factor") # term extraction scientometrics <- termExtraction(scientometrics, Field = "ID", ngrams = 1, synonyms=synonyms, verbose=TRUE)
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.