Torna alla pagina principale Sigillo di Ateneo

Research

Interests

Since 2001, when I started to research during my PhD, my activity has mainly dealt with information management, integration of heterogeneous information sources, semantics, ontologies, keyword-based search and the Semantic Web.

The main topics that I have addressed in my research up till now are:

  1. Intelligent Integration of Information and Semantic Web
  2. Software agents and data integration
  3. Keyword search on structured databases

1. Intelligent Integration of Information and Semantic Web
A large amount of structured data is currently available, mainly in relational and RDF formats. In this context, the integration of data from heterogeneous data sources is a major challenge that the research community in databases and artificial intelligence has been addressing. In this area, I participated in the development of MOMIS (Mediator Environment for Multiple Information Sources), a mediator system that aims to provide a uniform interface for accessing and querying heterogeneous information sources.
In this field, my research dealt with the following aspects: 1) Definition of the functional specifications of the MOMIS architecture; 2) Exploitation of a mediator based system in the area of e-commerce; 3) Data integration in the case of data evolution; 4) Exploitation of virtual views built with a data integration system as domain ontologies; 5) Development of techniques for the semi-automatic annotation of sources; 6) Summarization of the domain of an attribute with a subset of its values; 7) Integration of data and multimedia sources; 8) Integration of data and web services.

2. Software agents and data integration
He participated in the development of MIKS (Mediator agent for Integration of Knowledge Sources), which is the implementation of the MOMIS data integration system with the use of agent technology to improve access and querying of semi-structured sources.

3. Keyword search on structured databases
The more the relational data complexity is increasing and the user base is shifting towards the less technically skilled, the more the keyword searching is becoming an attractive alternative to traditional SQL queries, mainly due to its simplicity. Unfortunately, this simplicity comes with the price of inherent ambiguity. Thus, the challenge of answering a keyword query over a relational database is to discover the database structures that contain the keywords and explore how these structures are inter-connected to form an answer. The discovered structures, alongside their inter-connections, are actually representing in relational terms the semantic interpretation of the keyword query. Three approaches have been developed for addressing this issue: KEYMANTIC, KEYRY and QUEST, based on an extension of the Hungarian Algorithm, on a Hidden Markov Model, on different probabilistic frameworks fused with a Dempster-Shafer based approach, respectively.

Back to top

Main Results

  • RELEVANT – RELEvant VAlues geNeraTorResearch on data integration has provided languages and systems able to guarantee an integrated intensional representation of a given set of data sources. A significant limitation common to most proposals is that only intensional knowledge is considered, with little or no consideration for extensional knowledge. We developed a technique to enrich the intension of an attribute with a new sort of metadata: the “relevant values”, extracted from the attribute values. Relevant values enrich schemata with domain knowledge; moreover they can be exploited by a user in the interactive process of creating/refining a query. The technique, fully implemented in a prototype, is automatic, independent of the attribute domain and it is based on data mining clustering techniques and emerging semantics from data values. It is parametrized with various metrics for similarity measures and is a viable tool for dealing with frequently changing sources, as in the Semantic Web context.

Main Publications

S. Bergamaschi, F. Guerra, M. Orsini, C. Sartori, M. Vincini: “RELEVANT News: a semantic news feed aggregator”, 4th Workshop on Semantic Web Applications and Perspectives (SWAP 2007), Bari, Italy, December 18-20, 2007

S. Bergamaschi, F. Guerra, M.Orsini, C.Sartori: “Extracting Relevant Attribute Values for Improved Search”, IEEE Internet Computing, vol. 11, no. 5, pp. 26-35, Sept/Oct, 2007 (special issue on Semantic-Web-Based Knowledge Management), ISSN 1089-7801

S. Bergamaschi, F. Guerra, M. Orsini, C. Sartori: “A new type of metadata for querying data integration systems”, Proceedings of the Convegno Nazionale Sistemi di Basi di Dati Evolute (SEBD2007), Torre Canne (Fasano, BR)|, 17-20 June 2007, pp 266-273, ISBN 978-88-902981-0-3

S. Bergamaschi, F. Guerra, M. Orsini, C. Sartori: “Relevant values: new metadata to provide insight on attribute values at schema level”, In proceedings of the 9th International Conference on Enterprise Information Systems, Funchal, Madeira – Portugal, 12-16, June 2007 (p.p. 274-279). ISBN 978-972-8865-88-7

  • Keyword Search over structured databases
    Keyword queries offer a convenient alternative to traditional SQL in querying relational databases with large, often unknown, schemas and instances. The challenge in answering such queries is to discover their intended semantics, construct the SQL queries that describe them and used them to retrieve the respective tuples. Existing approaches typically rely on indices built a-priori on the database content. This seriously limits their applicability if a-priori access to the database content is not possible. Examples include the on-line databases accessed through web interface, or the sources in information integration systems that operate behind wrappers with specific query capabilities. Furthermore, existing literature has not studied to its full extend the inter-dependencies across the ways the different keywords are mapped into the database values and schema elements. In this research, we developed techniques and prototypes which are mainly based on metadata.

Main Publications

Francesco Guerra, Sonia Bergamaschi, Mirko Orsini, Antonio Sala, Claudio Sartori: Keymantic: A Keyword-based Search Engine using Structural Knowledge. ICEIS (1) 2009: 241-246.

Sonia Bergamaschi, Elton Domnori, Francesco Guerra, Mirko Orsini, Raquel Trillo Lado, Yannis Velegrakis: Keymantic: Semantic Keyword-based Searching in Data Integration Systems. PVLDB 3(2): 1637-1640 (2010) (pdf)

Sonia Bergamaschi, Elton Domnori, Francesco Guerra, Raquel Trillo Lado, and Yannis Velegrakis. Keyword Search over Relational Databases: a Metadata Approach. In Proc. of SIGMOD 2011, Athens, Greece, June 12-16. ACM, 2011.

Sonia Bergamaschi, Francesco Guerra, Silvia Rota, and Yannis Velegrakis. Understanding Linked Open Data through Keyword Searching: the KEYRY approach, 1st international workshop on linked web data management (LWDM 2011) in conjunction with the 14th EDBT 2011, Upsala, Sweden – March 21-25, 2011.

Sonia Bergamaschi, Elton Domnori, Francesco Guerra, Raquel Trillo Lado, and Yannis Velegrakis. Keyword-based Search in Data Integration Systems. Extended Abstract, SEBD 2011.

Sonia Bergamaschi, Francesco Guerra, Silvia Rota, and Yannis Velegrakis. A Hidden Markov Model Approach to Keyword-based Search over Relational Databases. In ER, 2011.

Sonia Bergamaschi, Francesco Guerra, Silvia Rota, and Yannis Velegrakis. KEYRY: a Keyword-based Search Engine over Structured Sources based on a Hidden Markov Model. In ER2011 (Demo).

Silvia Rota, Sonia Bergamaschi, Francesco Guerra. The List Viterbi training algorithm and its application to Keyword Search over Databases. In CIKM, 2011.

Sonia Bergamaschi, Francesco Guerra, Matteo Interlandi, Raquel Trillo Lado, Yannis Velegrakis: QUEST: A Keyword Search System for Relational Data based on Semantic and Machine Learning Techniques. PVLDB 6(12): 1222-1225 (2013)

Sonia Bergamaschi, Francesco Guerra, Matteo Interlandi, Raquel Trillo Lado, Yannis Velegrakis: Combining user and database perspective for solving keyword queries over relational databases. Inf. Syst. 55: 1-19 (2016)

  • KEYSTONE COST Action (IC1302) The main objective of the Action is to launch and establish a cooperative network of researchers, practitioners, and application domain specialists working in fields related to semantic data management, the Semantic Web, information retrieval, artificial intelligence, machine learning and natural language processing, that coordinates collaboration among them to enable research activity and technology transfer in the area of keyword-based search over structured data sources. The coordination effort will promote the development of a new revolutionary paradigm that provides users with keyword-based search capabilities for structured data sources as they currently do with documents. Furthermore, it will exploit the structured nature of data sources in defining complex query execution plans by combining partial contributions from different sources.The main objective of the Action is complemented by the following secondary objectives:
    1. Promote the development of novel techniques for keyword-based search over structured data sources.
    2. Facilitate the transfer of knowledge and technology to the scientific community, practitioners and the enterprises.
    3. Build a critical mass of research activities and outcomes that achieve the sustainability of the research themes beyond the Action

Main References

 

EXTERNAL REFERENCES