Vector Space Model
How many documents contain a term and what are important terms each document has.
Vector space model. Term frequency and inverse document frequency. Its first use was in the smart information retrieval system. Both the documents and queries are represented using the bag of words model.
A vector space also called a linear space is a collection of objects called vectors which may be added together and multiplied scaled by numbers called scalars scalars are often taken to be real numbers but there are also vector spaces with scalar multiplication by complex numbers rational numbers or generally any field the operations of vector addition and scalar multiplication. Term frequency tf and inverse document frequency idf. These relatively simple models are especially good at representing phenomena that are not usually considered numerical and have been.
It is used in information filtering information retrieval indexing and relevancy rankings. Contains the following information. The main score functions are based on.
Whether you explicitly understand this or not you ve used it in your machine learning projects. The model assumes that the relevance of a document to query is roughly equal to the document query similarity. Vector space models are representations built from vectors.
Vector space model is a statistical model for representing text information for information retrieval nlp text mining. A vector space model is an algebraic model involving two steps in first step we represent the text documents into vector of words and in second step we transform to numerical format so that we can apply any text mining techniques such as information retrieval information extraction information filtering etc. But a document can mean any object you re trying to model.
The vector space model ranks documents based on the vector space similarity between the query vector and the document vector there are many ways to compute the similarity between two vectors one way is to compute the inner product. Vector space model or term vector model is an algebraic model for representing text documents and any objects in general as vectors of identifiers such as index terms. The document is a vector of features weights the model is used to represent documents in an n dimensional space.