about

Arquivo.pt preserves millions of files collected from the web since 1996 and provides a public search service over this information. It contains information in several languages. Periodically it collects and stores information published on the web. Then, it processes the collect data to make it searchable, providing a "Google-like" service that enables searching the past web (English user interface available at archive.pt). This preservation workflow is performed through a large-scale distributed information system.


In this project we aim to provide creative and innovative ways of exploring the data preserved by Arquivo.pt . "Conta-me Historias" offers a narrative temporal view which enables users to get a temporal historical perspective of their searches. In order to guarantee the plurality and diversity of the information, we resort to 24 Portuguese news providers. Based on this, users will be able to construct their own narrative story either following a more credible source or a sensationalist one. One such approach offers journalists a privileged environment for the research of past events, historians, the possibility to revisit the past, and citizens, a democratic and plural access to an enormous wealth of information.


To showcase the Archive.pt data, we show the user the most important excerpts (namely text titles) of a topic over time. For the selection of the best news titles we resort to YAKE! a keyword extractor developed by our team, which has recently been awarded the Best Short Paper Award at the 40th edition of European Conference on Information Retrieval (ECIR'18). Additionally, we use SentiLex-PT01, a sentiment analysis tool for the Portuguese language developed by a national team of researchers, used on our project to analyze the sentiment of titles selected as relevant by YAKE!.



YAKE! References

Campos, R., Mangaravite, V., Pasquali, A., Jorge, A., Nunes, C., Jatowt, A. (2018). A Text Feature Based Automatic Keyword Extraction Method for Single Documents. In Proceedings of the 40th European Conference on Information Retrieval (ECIR'18). Grenoble, France, March 26- 29, pp. 684 - 691.

Campos, R., Mangaravite, V., Pasquali, A., Jorge, A., Nunes, C., Jatowt, A. (2018). YAKE! Collection-independent Automatic Keyword Extractor. In Proceedings of the 40th European Conference on Information Retrieval (ECIR'18). Grenoble, France, March 22 - 29, pp. 806 - 810. [Online Demo]


SentiLex-PT01

Silva, M., & Carvalho, P., & Costa, C., & Sarmento, L. (2010). Automatic Expansion of a Social Judgment Lexicon for Sentiment Analysis. Technical Report. TR 10-08. University of Lisbon, Faculty of Sciences, LASIGE, December 2010. doi: 10455/6694

PAMPO

Rocha C. , Jorge A., Sionara R., Brito P., Pimenta C., Rezende S. (2016) PAMPO: using pattern matching and pos-tagging for effective Named Entities recognition in Portuguese