Processes big text data files in batches efficiently. For this purpose, it offers functions for splitting, parsing, tokenizing and creating a vocabulary. Moreover, it includes functions for building either a document-term matrix or a term-document matrix and extracting information from those (term-associations, most frequent terms). Lastly, it embodies functions for calculating token statistics (collocations, look-up tables, string dissimilarities) and functions to work with sparse matrices. The source code is based on 'C++11' and exported in R through the 'Rcpp', 'RcppArmadillo' and 'BH' packages.
Version: | 1.0.9 |
Depends: | R (≥ 3.2.3), Matrix |
Imports: | Rcpp (≥ 0.12.5), R6, data.table, utils |
LinkingTo: | Rcpp, RcppArmadillo (≥ 0.7.5), BH |
Suggests: | testthat, covr, knitr, rmarkdown |
Published: | 2018-01-16 |
Author: | Lampros Mouselimis |
Maintainer: | Lampros Mouselimis <mouselimislampros at gmail.com> |
BugReports: | https://github.com/mlampros/textTinyR/issues |
License: | GPL-3 |
Copyright: | inst/COPYRIGHTS textTinyR copyright details |
URL: | https://github.com/mlampros/textTinyR |
NeedsCompilation: | yes |
SystemRequirements: | The package requires the following two components : A C++11 compiler and on a unix OS the boost-locale headers and libraries ( boost >= 1.55.0 , www.boost.org ). Debian/Ubuntu: libboost-locale-dev, Fedora : yum install boost-devel, OSX/brew : detailed installation instructions can be found in the README file |
Materials: | README NEWS |
CRAN checks: | textTinyR results |
Reference manual: | textTinyR.pdf |
Vignettes: |
Functionality of the textTinyR package |
Package source: | textTinyR_1.0.9.tar.gz |
Windows binaries: | r-devel: textTinyR_1.0.9.zip, r-release: textTinyR_1.0.9.zip, r-oldrel: textTinyR_1.0.9.zip |
OS X El Capitan binaries: | r-release: not available |
OS X Mavericks binaries: | r-oldrel: not available |
Old sources: | textTinyR archive |
Please use the canonical form https://CRAN.R-project.org/package=textTinyR to link to this page.