Text Analysis Pipelines: Towards Ad-hoc Large-scale Text Mining (lecture Notes In Computer Science)
by Henning Wachsmuth /
2016 / English / PDF
18.6 MB Download
This monograph proposes a comprehensive and fully automatic
approach to designing text analysis pipelines for arbitrary
information needs that are optimal in terms of run-time
efficiency and that robustly mine relevant information from text
of any kind. Based on state-of-the-art techniques from machine
learning and other areas of artificial intelligence, novel
pipeline construction and execution algorithms are developed and
implemented in prototypical software. Formal analyses of the
algorithms and extensive empirical experiments underline that the
proposed approach represents an essential step towards the ad-hoc
use of text mining in web search and big data analytics.
This monograph proposes a comprehensive and fully automatic
approach to designing text analysis pipelines for arbitrary
information needs that are optimal in terms of run-time
efficiency and that robustly mine relevant information from text
of any kind. Based on state-of-the-art techniques from machine
learning and other areas of artificial intelligence, novel
pipeline construction and execution algorithms are developed and
implemented in prototypical software. Formal analyses of the
algorithms and extensive empirical experiments underline that the
proposed approach represents an essential step towards the ad-hoc
use of text mining in web search and big data analytics.
Both web search and big data analytics aim to fulfill peoples’
needs for information in an adhoc manner. The information sought
for is often hidden in large amounts of natural language text.
Instead of simply returning links to potentially relevant texts,
leading search and analytics engines have started to directly
mine relevant information from the texts. To this end, they
execute text analysis pipelines that may consist of several
complex information-extraction and text-classification stages.
Due to practical requirements of efficiency and robustness,
however, the use of text mining has so far been limited to
anticipated information needs that can be fulfilled with rather
simple, manually constructed pipelines.
Both web search and big data analytics aim to fulfill peoples’
needs for information in an adhoc manner. The information sought
for is often hidden in large amounts of natural language text.
Instead of simply returning links to potentially relevant texts,
leading search and analytics engines have started to directly
mine relevant information from the texts. To this end, they
execute text analysis pipelines that may consist of several
complex information-extraction and text-classification stages.
Due to practical requirements of efficiency and robustness,
however, the use of text mining has so far been limited to
anticipated information needs that can be fulfilled with rather
simple, manually constructed pipelines.