lexicon

Project Status: Active - The project has reached a stable, usable state and is being actively developed. Build Status

Table of Contents

Description

lexicon is a collection of lexical hash tables, dictionaries, and word lists. The data prefixes help to categorize the data types:

Prefix Meaning
key_ A data.frame with a lookup and return value
hash_ A keyed data.table hash table
freq_ A data.table of terms with frequencies
profanity_ A profane words vector
pos_ A part of speech vector
pos_df_ A part of speech data.frame
sw_ A stopword vector

Data

Data Description

common_names

First Names (U.S.)

constraining_loughran_mcdonald

Loughran-McDonald Constraining Words

discourse_markers_alemany

Alemany’s Discourse Markers

dodds_sentiment

Language Assessment by Mechanical Turk Sentiment Words

emojis_sentiment

Emoji Sentiment Data

enable_word_list

ENABLE Word List

freq_first_names

Frequent U.S. First Names

freq_last_names

Frequent U.S. Last Names

function_words

Function Words

grady_augmented

Augmented List of Grady Ward’s English Words and Mark Kantrowitz’s Names List

hash_emojis

Emoji Description Lookup Table

hash_emojis_identifier

Emoji Identifier Lookup Table

hash_emoticons

Emoticons

hash_grady_pos

Grady Ward’s Moby Parts of Speech

hash_internet_slang

List of Internet Slang and Corresponding Meanings

hash_lemmas

Lemmatization List

hash_power

Power Lookup Key

hash_sentiment_emojis

Emoji Sentiment Polarity Lookup Table

hash_sentiment_huliu

Hu Liu Polarity Lookup Table

hash_sentiment_inquirer

Inquirer Polarity Lookup Table

hash_sentiment_jockers

Jockers Sentiment Polarity Table

hash_sentiment_jockers_rinker

Combined Jockers & Rinker Polarity Lookup Table

hash_sentiment_loughran_mcdonald

Loughran-McDonald Polarity Table

hash_sentiment_nrc

NRC Sentiment Polarity Table

hash_sentiment_senticnet

Augmented SenticNet Polarity Table

hash_sentiment_sentiword

Augmented Sentiword Polarity Table

hash_sentiment_slangsd

SlangSD Sentiment Polarity Table

hash_sentiment_socal_google

SO-CAL Google Polarity Table

hash_sentiment_vadar

Filtered VADAR Polarity Table

hash_strength

Strength Lookup Key

hash_syllable

Syllable Counts

hash_valence_shifters

Valence Shifters

key_abbreviation

Common Abbreviations

key_contractions

Contraction Conversions

key_grade

Grades Hash

key_rating

Ratings Data Set

key_sentiment_jockers

Jockers Sentiment Data Set

modal_loughran_mcdonald

Loughran-McDonald Modal List

nrc_emotions

NRC Emotions

pos_action_verb

Action Word List

pos_adverb

Adverb Word List

pos_df_irregular_nouns

Irregular Nouns Word Dataframe

pos_df_pronouns

Pronouns

pos_interjections

Interjections

pos_preposition

Preposition Words

pos_unchanging_nouns

Nouns that are the Same Plural/Singular

profanity_alvarez

Alejandro U. Alvarez’s List of Profane Words

profanity_arr_bad

Stackoverflow user2592414’s List of Profane Words

profanity_banned

bannedwordlist.com’s List of Profane Words

profanity_google

Google’s List of Profane Words

profanity_von_ahn

Luis von Ahn’s List of Profane Words

sw_buckley_salton

Buckley & Salton Stopword List

sw_dolch

Leveled Dolch List of 220 Common Words

sw_fry_100

Fry’s 100 Most Commonly Used English Words

sw_fry_1000

Fry’s 1000 Most Commonly Used English Words

sw_fry_200

Fry’s 200 Most Commonly Used English Words

sw_fry_25

Fry’s 25 Most Commonly Used English Words

sw_jockers

Matthew Jocker’s Expanded Topic Modeling Stopword List

sw_loughran_mcdonald_long

Loughran-McDonald Long Stopword List

sw_loughran_mcdonald_short

Loughran-McDonald Short Stopword List

sw_lucene

Lucene Stopword List

sw_mallet

MALLET Stopword List

sw_onix

Onix Text Retrieval Toolkit Stopword List 1

sw_python

Python Stopword List

Installation

To download the development version of lexicon:

Download the zip ball or tar ball, decompress and run R CMD INSTALL on it, or use the pacman package to install the development version:

if (!require("pacman")) install.packages("pacman")
pacman::p_load_gh("trinker/lexicon")

Contact

You are welcome to:
- submit suggestions and bug-reports at: https://github.com/trinker/lexicon/issues
- send a pull request on: https://github.com/trinker/lexicon/
- compose a friendly e-mail to: tyler.rinker@gmail.com