Torna alla pagina principale Sigillo di Ateneo



Since 2001, when I started to research during my PhD, my activity has mainly dealt with information management, integration of heterogeneous information sources, semantics, ontologies, keyword-based search and the Semantic Web.

The main topics that I have addressed in my research up till now are:

  1. Intelligent Integration of data and  information
  2. Software agents and data integration
  3. Keyword search on structured databases
  4. Big Data Analytics, and data mining on large amount of data

1. Intelligent Integration of data and Information
A large amount of structured data is currently available, mainly in relational and RDF formats. In this context, the integration of data from heterogeneous data sources is a major challenge that the research community in databases and artificial intelligence has been addressing. In this area, I participated in the development of MOMIS (Mediator Environment for Multiple Information Sources), a mediator system that aims to provide a uniform interface for accessing and querying heterogeneous information sources.
In this field, my research dealt with the following aspects: 1) Definition of the functional specifications of the MOMIS architecture; 2) Exploitation of a mediator based system in the area of e-commerce; 3) Data integration in the case of data evolution; 4) Exploitation of virtual views built with a data integration system as domain ontologies; 5) Development of techniques for the semi-automatic annotation of sources; 6) Summarization of the domain of an attribute with a subset of its values; 7) Integration of data and multimedia sources; 8) Integration of data and web services.

2. Software agents and data integration
He participated in the development of MIKS (Mediator agent for Integration of Knowledge Sources), which is the implementation of the MOMIS data integration system with the use of agent technology to improve access and querying of semi-structured sources.

3. Keyword search on structured databases
The more the relational data complexity is increasing and the user base is shifting towards the less technically skilled, the more the keyword searching is becoming an attractive alternative to traditional SQL queries, mainly due to its simplicity. Unfortunately, this simplicity comes with the price of inherent ambiguity. Thus, the challenge of answering a keyword query over a relational database is to discover the database structures that contain the keywords and explore how these structures are inter-connected to form an answer. The discovered structures, alongside their inter-connections, are actually representing in relational terms the semantic interpretation of the keyword query. Three approaches have been developed for addressing this issue: KEYMANTIC, KEYRY and QUEST, based on an extension of the Hungarian Algorithm, on a Hidden Markov Model, on different probabilistic frameworks fused with a Dempster-Shafer based approach, respectively.

4. Big Data Analytics and data mining on large amount of data

He partecipated in the development of techniques for understanding the content of large datasets, in the development of techniques for applied rule-based entity resolution on large datasets and for analyzing the Wikipedia Infoboxes.


Participation in Italian and European Projects

  1. the Italian PRIN WISDOM (Web Intelligent Search based on DOMain ontologies) project –years 2004-2006, where he worked on the project themes: Creation and update of a Domain Ontology, Emergent Semantics: discovering semantic mappings amongst domain ontologies and he supported the coordination of the activities.
  2. the European SEWASIE project (SEmantic Web and AgentS in Integrated Economies) – years 2002-2005, where, in particular, he worked on WP2 (semantic enrichment and integration) to define semantic enrichment processes for information sources and to develop the virtual data stores which constitute the information nodes accessible by the users; and on WP10 (Project management) where he took charge of the publishing of the final report and the showcase.
  3. the EU Project STASIS Software for Ambient Semantic Interoperable Services (years 2006-2009), which allows SMEs to participate in the e-Economy by offering a coherent set of semantic applications.
  4. the Italian FIRB NEP4B: Networked Peers for Business (years 2006-2009) that developed an infrastructure to enable companies of any nature, size and location to search for partners, exchange data and collaborate. He was involved in WP2 Design of single peer functional architecture, WP3 Prototyping and Validation, WP5 Dissemination and exploitation (Responsible person for UNIMORE) and WP6 Project management (he supported the Coordinator’s tasks).
  5. the local project: Searching for a needle in mountains of data (years 2008-2010), funded by the Fondazione Cassa di Risparmio di Modena, addressing the issue of easily querying a mediator based system.
  6. the COST Action KEYSTONE semantic KEYword-based Search on sTructured data sOurcEs (IC1302), years 2013-2017, where I was  the Chair of the Management Committee.
  7. the CEF Telecom Re-search Alps project, years 2017-2019, where I’m the Principal Investigator.

Back to top

Main Results

  • RELEVANT – RELEvant VAlues geNeraTorResearch on data integration has provided languages and systems able to guarantee an integrated intensional representation of a given set of data sources. A significant limitation common to most proposals is that only intensional knowledge is considered, with little or no consideration for extensional knowledge. We developed a technique to enrich the intension of an attribute with a new sort of metadata: the “relevant values”, extracted from the attribute values. Relevant values enrich schemata with domain knowledge; moreover they can be exploited by a user in the interactive process of creating/refining a query. The technique, fully implemented in a prototype, is automatic, independent of the attribute domain and it is based on data mining clustering techniques and emerging semantics from data values. It is parametrized with various metrics for similarity measures and is a viable tool for dealing with frequently changing sources, as in the Semantic Web context.

Main Publications

S. Bergamaschi, F. Guerra, M. Orsini, C. Sartori, M. Vincini: “RELEVANT News: a semantic news feed aggregator”, 4th Workshop on Semantic Web Applications and Perspectives (SWAP 2007), Bari, Italy, December 18-20, 2007

S. Bergamaschi, F. Guerra, M.Orsini, C.Sartori: “Extracting Relevant Attribute Values for Improved Search”, IEEE Internet Computing, vol. 11, no. 5, pp. 26-35, Sept/Oct, 2007 (special issue on Semantic-Web-Based Knowledge Management), ISSN 1089-7801

S. Bergamaschi, F. Guerra, M. Orsini, C. Sartori: “A new type of metadata for querying data integration systems”, Proceedings of the Convegno Nazionale Sistemi di Basi di Dati Evolute (SEBD2007), Torre Canne (Fasano, BR)|, 17-20 June 2007, pp 266-273, ISBN 978-88-902981-0-3

S. Bergamaschi, F. Guerra, M. Orsini, C. Sartori: “Relevant values: new metadata to provide insight on attribute values at schema level”, In proceedings of the 9th International Conference on Enterprise Information Systems, Funchal, Madeira – Portugal, 12-16, June 2007 (p.p. 274-279). ISBN 978-972-8865-88-7

  • Keyword Search over structured databases
    Keyword queries offer a convenient alternative to traditional SQL in querying relational databases with large, often unknown, schemas and instances. The challenge in answering such queries is to discover their intended semantics, construct the SQL queries that describe them and used them to retrieve the respective tuples. Existing approaches typically rely on indices built a-priori on the database content. This seriously limits their applicability if a-priori access to the database content is not possible. Examples include the on-line databases accessed through web interface, or the sources in information integration systems that operate behind wrappers with specific query capabilities. Furthermore, existing literature has not studied to its full extend the inter-dependencies across the ways the different keywords are mapped into the database values and schema elements. In this research, we developed techniques and prototypes which are mainly based on metadata.

Main Publications

Francesco Guerra, Sonia Bergamaschi, Mirko Orsini, Antonio Sala, Claudio Sartori: Keymantic: A Keyword-based Search Engine using Structural Knowledge. ICEIS (1) 2009: 241-246.

Sonia Bergamaschi, Elton Domnori, Francesco Guerra, Mirko Orsini, Raquel Trillo Lado, Yannis Velegrakis: Keymantic: Semantic Keyword-based Searching in Data Integration Systems. PVLDB 3(2): 1637-1640 (2010) (pdf)

Sonia Bergamaschi, Elton Domnori, Francesco Guerra, Raquel Trillo Lado, and Yannis Velegrakis. Keyword Search over Relational Databases: a Metadata Approach. In Proc. of SIGMOD 2011, Athens, Greece, June 12-16. ACM, 2011.

Sonia Bergamaschi, Francesco Guerra, Silvia Rota, and Yannis Velegrakis. Understanding Linked Open Data through Keyword Searching: the KEYRY approach, 1st international workshop on linked web data management (LWDM 2011) in conjunction with the 14th EDBT 2011, Upsala, Sweden – March 21-25, 2011.

Sonia Bergamaschi, Elton Domnori, Francesco Guerra, Raquel Trillo Lado, and Yannis Velegrakis. Keyword-based Search in Data Integration Systems. Extended Abstract, SEBD 2011.

Sonia Bergamaschi, Francesco Guerra, Silvia Rota, and Yannis Velegrakis. A Hidden Markov Model Approach to Keyword-based Search over Relational Databases. In ER, 2011.

Sonia Bergamaschi, Francesco Guerra, Silvia Rota, and Yannis Velegrakis. KEYRY: a Keyword-based Search Engine over Structured Sources based on a Hidden Markov Model. In ER2011 (Demo).

Silvia Rota, Sonia Bergamaschi, Francesco Guerra. The List Viterbi training algorithm and its application to Keyword Search over Databases. In CIKM, 2011.

Sonia Bergamaschi, Francesco Guerra, Matteo Interlandi, Raquel Trillo Lado, Yannis Velegrakis: QUEST: A Keyword Search System for Relational Data based on Semantic and Machine Learning Techniques. PVLDB 6(12): 1222-1225 (2013)

Sonia Bergamaschi, Francesco Guerra, Matteo Interlandi, Raquel Trillo Lado, Yannis Velegrakis: Combining user and database perspective for solving keyword queries over relational databases. Inf. Syst. 55: 1-19 (2016)

  • KEYSTONE COST Action (IC1302) The main objective of the Action is to launch and establish a cooperative network of researchers, practitioners, and application domain specialists working in fields related to semantic data management, the Semantic Web, information retrieval, artificial intelligence, machine learning and natural language processing, that coordinates collaboration among them to enable research activity and technology transfer in the area of keyword-based search over structured data sources. The coordination effort will promote the development of a new revolutionary paradigm that provides users witKeystoneh keyword-based search capabilities for structured data sources as they currently do with documents. Furthermore, it will exploit the structured nature of data sources in defining complex query execution plans by combining partial contributions from different sources.The main objective of the Action is complemented by the following secondary objectives:
    1. Promote the development of novel techniques for keyword-based search over structured data sources.
    2. Facilitate the transfer of knowledge and technology to the scientific community, practitioners and the enterprises.
    3. Build a critical mass of research activities and outcomes that achieve the sustainability of the research themes beyond the Action

Main References

  • KEYSTONE Website
  • COST website
  • Francesco Guerra, Yannis Velegrakis, Jorge S. Cardoso, John G. Breslin: The KEYSTONE IC1302 COST Action. International KEYSTONE Conference 2017: 187-195
  • Research Alps Project (INEA/CEF/ICT/A2016/1296967) The RE-SEARCH ALPS project aims to gather, consolidate, harmonize and make available to different targets (public  and private bodies working at local, regional and national level) data about laboratories and research and innovation  centers which are active in particular in the regions of seven countries which constitute the Alpine Area. The project  will allow to support the research and development process to possibly know: (a) what the laboratories and research  centers do; (b) where they are located; (c) where excellence emerges, according to research fields / themes and  numbers (active researchers, published papers, won prizes, running EU projects, gotten patents, etc); (d) people  working in a specific center and determining its excellence; (e) network of relations they highlight in their websites.
    The project is structured in three activities, and aims at achieving three main results: (1) definition of a standard set of  metadata (based and extending the one defined by the INSPIRE Geoportal) able to represent laboratories and  research and innovation centers; (2) publication of an open dataset describing the laboratories existing in the Alpine  Area, with particular reference to the 48 Regions constituting the Area; (3) development of a semantic and  multilingual web application for supporting users in querying the dataset and visualizing the obtained results. The  project is strongly based on the CEF Building Blocks for data translation and user identification.  The RE-SEARCH ALPS project consortium is led by the University of Modena and Reggio Emilia (UNIMORE), and  includes Data Publica, FR, University of Milan (UMIL), IT, the French Research Ministry (MESRI),FR, the Italian
    Research Ministry (MIUR). A number of Public Administrations, acting as data providers and application users, will be  involved in the project through EUSALP – EU Strategy for the Alpine Region, Action Group 1, led by UMIL.

Main References

  • Re-search Alps website
  • Francesco Guerra, Paolo Sottovia, Matteo Paganelli, Maurizio Vincini: Big Data Integration of Heterogeneous Data Sources: The Re-Search Alps Case Study. BigData Congress 2019: 106-110