Este es trabajo pionero dentro del campo de LSA, En el se describe por primera vez enque consiste esta técnica :
A new method for automatic indexing and retrieval is described. The approach is to take advantage of implicit higher-order structure in the association of terms with documents ("semantic structure") in order to improve the detection of relevant documents on the basis of terms found in queries. The particular technique used is singular-value decomposition, in which a large term by document matrix is decomposed into a set of ca 100 orthogonal factors from which the original matrix can be approximated by linear combination. Documents are represented by ca 100 item vectors of factor weights. Queries are represented as pseudo-document vectors formed from weighted combinations of terms, and documents with supra-threshold cosine values are returned. Initial tests find this completely automatic method for retrieval to be promising.
En este paper se pone un ejemplo que se convertira en clásico y que se discutira en diferentes papers con posterioridad, previsiblemente tambien en el mio.
A concrete example may make the procedure and its putative advantages clearer. Table 2 gives a sample dataset. In this case, the document set consisted of the titles of 9 Bellcore technical memoranda. Words occurring in more than one title were selected for indexing; they are italicized. Note that there are two classes of titles: five about human-computer interaction (labeled c1-c5) and four about graph theory (labeled m1-m4). The entries in the term by document matrix are simply Deerwester – 9 – the frequencies with which each term actually occurred in each document. Such a matrix could be used directly for keyword-based retrievals or, as here, for the initial input of the SVD analysis.
Paper : Indexing by Latent Semantic Analysis (PDF)
Autores : Scott C. Deerwester, Susan T. Dumais, Thomas K. Landauer, George W. Furnas, Richard A. Harshman
Publicación : Journal of the American Society of Information Science 1990
bibtex entry :
author = "Scott C. Deerwester and Susan T. Dumais and Thomas K. Landauer and George W. Furnas and Richard A. Harshman",title = "Indexing by Latent Semantic Analysis",journal = "Journal of the American Society of Information Science", volume = "41", number = "6", pages = "391-407", year = "1990"}