Talk Gensim -- Statistical Semantics in Python

Abstract

Gensim is a library for statistical analysis of plain text. It uses unique, scalable (incremental, memory-independent and optionally also distributed) implementations of several popular algorithms, such as Latent Semantic Analysis and Latent Dirichlet Allocation, to find document-document similarities. Particular emphasis is placed on straightforward and intuitive API design, so that its application and extensions are effortless. Gensim was created with digital libraries in mind, and although it has been deployed there successfully, its underlying novel algorithms for large-scale online SVD and LDA are like the Swiss Army knife of data analysis -- also useful on their own, outside of the domain of Natural Language Processing.

tagged by
no related entity