Search upon UML repositories with text matching techniques

More Info
expand_more

Abstract

As the quantity of software artifacts, mainly source code and software models, stored in repositories increases, the need for their efficient search becomes more important. In this paper we propose content-based query (a.k.a query-by-example) approach for searching software model repositories, in order to retrieve significant models or model fragments. The query-by-example search conveys the user need in form of a model or pattern specified in a coarse way. Our approach incorporates analysis and indexing of models using textual information retrieval techniques, which exploit the knowledge of the metamodel the models conform to. This allows us to explore different segmentation granularities on models and different indexing techniques ranging from simple bag of words, to index structures which integrate metamodel information. We detail the proposed theoretical framework, the implementation of the method upon open-source architectures, and we discuss the results of our experiments upon a public dataset of UML models.