The document summarizes an experiment using the SMART information retrieval system from 1964. The SMART system took documents and queries in English, analyzed the texts, matched queries to documents, and retrieved the most similar items. The experiment used a collection of 1268 library science abstracts to test the SMART system's retrieval effectiveness under different relevance judgments from query authors and outside experts. The results showed that comparing ranking of recall-precision curves was better than comparing individual recall and precision values to evaluate different processing methods.
2. INTRODUCTION
The SMART system was designed in 1964, largely as an
experimental tool for the evaluation of the effectiveness of
many different types of analysis and search procedures. Salton
characterizes the system through the following steps of its
function.
It is used to:
Take documents and search queries posed in English
Perform a fully automatic content analysis of texts
Match analyzed search statements and contents of documents
Retrieve the stored items which are most similar to the
queries.
3. SMART system overview
Thesaurus look-up procedures
Phrase generation methods
Statistical term associations
Hierarchical term expansion, and so on.
4. Evaluation procedures
Through the original SMART experiments
were conducted in a laboratory
environment, the basic aim was to develop a
prototype for a fully automated information
retrieval system.
The evaluation procedures incorporated into
the system lent themselves into a pair-wise
comparison of the effectiveness of two or
more processing methods.
5. Cont..
The following evaluation measures were
generated by the SMART system:
A recall-precision graph reflecting the average
precision value at ten discrete recall points-from
a recall of 0.1 to a recall of 1.0 in intervals of 0.1
To global measure, known as normalized recall
and normalized precision, which together reflect
the overall performance level of the system
Two simplified global measures, known as rank
recall and log precision, respectively.
6. Methodology
A collection of 1268 abstracts in the field of library
science and documentation, comprising about
131,500 English text words, was used for this
experiment.
The collection contained articles mainly published
in American documentation in 1963 and 1964, and
also in other journals in the given subject area.
7. Cont..
Thus, for each of the 48 queries, a set of four
different document sets became available, each
consisting of the items termed relevant by a
different set of people as follows:
A set – relevance assessed by the query author
B set – relevance assessed by outside subject expert
C set- relevance assessed by either A or B assessor
D set- relevance asserted by both A and B assessor.
8. cont…
The relevance judgment groupings were as follows:
Group judges Function
A
Original group of query authors. Each person in the
A group made relevance for his or her six queries.
B
Non-author judges. Each person in the B group made
relevance judgments for six queries corresponding to
six different authors from the A group.
C
The document is relevant to a given query if either the
A judge or B judge termed it relevant.
D
The document is relevant to a given query if both A
and B judges termed it relevant.
9. Results
It was found that, under normal
circumstances, an evaluation of performance
for a variety of processing methods required
an examination of the ranking of the
corresponding recall-precision curves, rather
than a detailed comparison of the actual
recall and precision values. From a ranking of
the recall-precision graphs obtained from the
several processing methods it was noted that