For most of human history, information was recorded in human language. Databases are relatively new. Statistics and probability theory help us understand the numeric information in those databases, but we also need statistical tools and techniques to help us understand human language.
Which words are the most commonly used words? Which words (or which category of words) are the most common objects of a particular verb or of a particular preposition? Which adjectives are most frequently used to describe a particular noun?
How do the words used to describe a stock affect its price? How much do they affect its price?
To explore these questions, I am writing the text mining notes below. And to develop new tools to understand human language, I am annotating the Sicilian language.