2023-03-13 17:39:53 +01:00
|
|
|
# Keyword Extraction
|
|
|
|
|
|
|
|
This code deals with identifying keywords in a document, their positions in the document,
|
|
|
|
their important based on [TF-IDF](https://en.wikipedia.org/wiki/Tf-idf) and their grammatical
|
|
|
|
functions based on [POS tags](https://www.ling.upenn.edu/courses/Fall_2003/ling001/penn_treebank_pos.html).
|
|
|
|
|
|
|
|
## Central Classes
|
|
|
|
|
2023-03-16 21:35:54 +01:00
|
|
|
* [DocumentKeywordExtractor](src/main/java/nu/marginalia/keyword/DocumentKeywordExtractor.java)
|
2023-03-21 16:12:31 +01:00
|
|
|
* [KeywordMetadata](src/main/java/nu/marginalia/keyword/KeywordMetadata.java)
|
2023-03-13 17:39:53 +01:00
|
|
|
|
|
|
|
## See Also
|
|
|
|
|
|
|
|
* [libraries/language-processing](../../libraries/language-processing) does a lot of the heavy lifting.
|