GENTIO aims for radical innovation in the way we produce, enrich and analyse digital content. The project will develop a flexible Deep Learning Architecture to unify the understanding of text at three fundamental levels: structure, content
and context. In the first use case, we will experiment with new methods for communications experts to maximize the impact of data-driven publishing. In the second use case, we will correct and classify noisy output from Optical Characte
Recognition (OCR) systems.
Recent years have shown major advances in the automated extraction of factual, affec-tive and contextual knowledge from digital content streams. GENTIO builds on these advances to change the way we produce, enrich and analyse digital content. The project will develop a flexible Multi-Task Learning (MTL) approach based on Generative Learning Networks to unify the understanding of text at three fundamental levels: structure, con-tent and context. Thereby it aims to boost the context processing capabilities of Natural Language Processing (NLP) frameworks, reduce the high cost of developing training data, and support the cost-effective development of intelligent semantic systems. By offering interactive visualizations to explore the extracted features, the project will also put special emphasis on increasing the transparency of the underlying computational processes, which is a typical shortcoming of Artificial Intelligence-based systems.
Supported by multilingual and highly scalable knowledge graph technology, the envi-sioned approach will be applicable across numerous domains and regions. To demon-strate its versatility, two distinct domains have been chosen. The first use case targets the marketing domain. It will experiment with new methods for communication experts to maximize the impact of data-driven publishing. The second use case targets the news media sector, automatically correcting and classifying noisy output from Optical Charac-ter Recognition (OCR) systems – using topics extracted from the public debate on other microblogging sites to obtain the required context information.
The two use cases allow GENTIO to investigate the production as well as the analysis of digital content, driven by leading use case partners in their respective fields – Ketchum Publico as the Austrian representative of a global communications consultancy versus the OBSERVER as an established Austrian media intelligence SME with a history of more than 100 years. As part of the exploitation planning in the second half of the project, GENTIO will clearly define the potential of using its MTL capabilities in a variety of other domains including broadcasting (semantic search for video retrieval), retailing and con-sumer brands (reputation management), telecommunications (helpdesk and support), consulting and auditing (legal text annotation and evaluation) and mobility (crowd-based feedback systems for autonomous driving applications).