Using Gqm and Tam to Evaluate Start – a Tool That Supports Systematic Review

Background: Although Systematic Literature Review (SLR) is a reliable way of conducting literature review, its process is laborious and composed of repetitive activities. Hence, aiming to facilitate and support the conduction of such a process, the StArt tool was developed. Objective: As any new technology should be evaluated before its use, the objective of this paper is to present an overview of this tool and describe an evaluation that was carried out aiming at characterizing its usefulness and its ease of use. Method: The evaluation, applied twice, was designed through GQM paradigm and TAM model. The participants were graduate students who had a previous knowledge on SLR and have already applied the SLR process manually. Results: In both the evaluations the results were concentrated on the answers " extremely agree " or " quite agree " both for the usefulness and for the ease of use. Conclusion: Based on the results the further actions are: improvements related to the " quite agree " answers and the conduction of an experiment for evaluating the StArt in a deeper way. Despite these needed improvements, the results provide insights that StArt indeed helps the conduction of SLR and facilitates the application of its process.


Introduction
Evidence-based software engineering (EBSE) has received great attention nowadays.It focuses on the identification of the best research evidences, aiming at integrating them with practical experience and human value.In addition, EBSE focuses on the application of this knowledge in the decision making process regarding software development and maintenance [1] [2] [3].
According to Kitchenham et al. [1], the term evidence corresponds to the synthesis of the best studies on a research topic provided in primary studies of literature.A way to compose this synthesis is applying Systematic Literature Review (SLR), which is a type of secondary study, and makes use of a process that is reliable, rigorous and that allows auditing [4].If on one hand this process provides advantages to SLRs in relation to the traditional literature review, on the other hand, it is laborious and error prone when applied only manually.Therefore, the support of a tool is essential to achieve the expected results of a SLR.
Based on this context, the objective of this paper is to present a tool that supports the SLR process and comment two viability studies carried out aiming to evaluate its perceived usefulness and ease of use.This tool is named StArt -State of the Art through Systematic Review and has been developed at Federal University of São Carlos (UFSCar), in the Software Engineering Research Laboratory (LaPES).
The paper is organized as follows: Section 2 presents a summary on Systematic Literature Review and the characteristics that differentiate it from traditional literature review.Section 3 presents the tool StArt by exploring its functionalities and the way it supports the SLR process, as well as how it facilitates some tasks that should be executed during that process.Section 4 provides an overview of related tools found in literature, Section 5 presents some preliminary studies that were carried out to evaluate StArt and finally, Section 6 presents the final remarks and further work.

Systematic Literature Review
According [5], the Systematic Literature Review is supported by a well-defined process that makes it different from the traditional literature review.Some characteristics of the SLR are: starts by defining a Protocol which must contain a set of information used during the process execution, including the question being addressed; is based on a search strategy carefully defined for identifying as much of the relevant literature related to the research question; documents its search strategy such that it can be followed rigorously; requires that the inclusion and exclusion criteria used to evaluate each potential primary study are explicitly defined in the Protocol; requires the specification of the quality criteria that should be used to evaluate the content of each primary study; and must always be conducted when a quantitative meta-analysis is required.
Despite the advantages of a SLR, as good coverage, replicability and reliability, its process is more laborious than the one related to an informal literature research [5].Thus, considering that there are several stages to be executed and several documents to be managed, computational support can facilitate the work and enable higher quality in the execution process.Although there are slight differences among the SLR processes commented in literature, they all involve planning, execution, analysis and dissemination of results [5] [6].
In the Planning stage, the aim is to define a Protocol which contains all the information and the necessary procedures for the execution of the following stages.Examples of the information and the procedures needed are: the research question, the keywords, the search engines and the studies inclusion and exclusion criteria.
In the Execution stage, three steps must be conducted: the Studies Identification on the search engines defined in the Protocol, the Selection of these studies, based on the inclusion and the exclusion criteria, and the Extraction of data from the selected studies.
In the Summarization stage, the data extracted from the studies are analysed and summarized aiming at answering the research question defined in the Protocol.
After the conclusion of these three stages, it is important to report the results through technical reports or scientific papers to show the state of the art of the topic in focus.

The Tool StArt
Some activities in the SLR process are repetitive and require discipline and systematic practice by the researcher.The information must be registered in an organized way so that the SLR provides the expected results, is replicable, and allows that all the information can be packed.StArt provides support to the SLR process activities, except to the automated search of primary studies in electronic databases, since this is considered as a robot action, which is blocked by these mechanisms.Therefore, the researcher must do the search manually through the search engines registered in the Protocol every time a search is necessary.The search result must be exported from the search engine as a BibTex file which must be imported into StArt.
Figure 1 presents a screen of the tool.In the left side a hierarchical tree shows the process stages to be followed.Some pieces of information in this tree are filled out dynamically, as the researcher defines the Protocol or as the process steps are carried out.This resource of StArt helps the researcher in keeping the information updated and consistent.
The following subsections describe the way each SLR stage is supported by the tool.

Planning
In this stage, the researcher must define the Protocol that will support the other SLR stages.The Protocol fields available in the tool are the ones suggested by [5].StArt has a help icon that provides the description and example of each field.As some fields have influence over other process stages, the tool assists in keeping these relations controlled.For instance: -Source List: this field contains the list of all the search engines which will be used to gather the studies.Meanwhile they are inserted in the Protocol, the name of the search engines are automatically added in the side-tree of the main screen (Figure 1).The separation of the search engines allows a better organization of the studies as well as the information control in the studies identification step; -Keywords: this field contains the keywords that will be used to compose the search strings.When the studies are uploaded into the StArt, it uses the keywords to score the studies according to the number of occurrences of these words in their title, abstract and keywords.This score, showed in Figure 2, suggests the studies relevance order; -Studies Inclusion and Exclusion Criteria Definition: this field, as showed in Figure 1, contains the criteria that will be used to accept or reject each study during the selection step.StArt makes them available in this step and allows the researcher to register the ones were applied to each of the studies, as showed in Figure 2; -Information Extraction Form Attributes: this field contains the attributes that will compose the form which must be filled in by the researcher in the Extraction stage, as explained in subsection 3.2.3.

Execution
Once the Protocol is concluded, the researcher is able to perform the Execution stage that is composed of three steps: Studies Identification, Selection and Extraction.

Studies Identification
In this step the objective is to gather a set of studies that is related to the research question.Thus, the researcher should: (i) apply the search strings to each of the search engines specified in the Protocol and export the results in a BibTex format, and (ii) import into StArt the BibTex and store the search string used by the search engine, since the search string that allows a faithful replication of the SLR.The tool also allows manually insertion of studies.
Once the BibTex file was imported, all the information presented in Figure 2 is available in the StArt.This screen shows the search string used in this session, the number of studies identified, and a table with some attributes of each study, as its identification, title, author(s), status at the Selection step, status at the Extraction step, Reading Priority and Score.The score, as mentioned before, is automatically calculated according to the number of times the keywords defined in the Protocol Duplicated studies are also automatically identified by the tool.The two fields Status must be filled by the researcher, according to the process step.

Selection
In this step the primary studies uploaded into the Start must be accepted or rejected according to the inclusion and exclusion criteria defined in the Protocol.Figure 3 illustrates the facility provided by the tool for doing this activity.The decision should be made after reading the title, abstract and keywords of the study, which are available for each study, as shown in Figure 4.At the end of this step all the accepted studies are automatically transferred to the Extraction step.Figure 5 exemplifies this fact: see that there are seven papers as Accepted Papers in the Selection step and a total of seven papers in the Extraction step.

Extraction
In this step, all the studies that have been accepted in the Selection step should be read in full and be analyzed again a study is rejected if they are not relevant to answer the main question defined in the Protocol it must be rejected in this step.The Reading Priority field that can be filled during the Selection step may help the researcher with the reading order.Although the full studies must be downloaded by the researcher, they can be linked to the SLR, which facilitates the access to the document.For the papers classified as Accepted in this step, the researcher must extract the information correspondent to the attributes of the Information Extraction Form, defined in the Protocol.This form is available in this step as shown in Figure 5.This facility promotes a systematic way for extracting information.

Summarization
In this stage the researcher should describe the state of the art of the topic in focus.StArt facilitates the access to the information extracted during the Extraction step and provides a text editor to help in a first version of the summarization document when this stage is reached some data on the whole SLR are available, as shown in Figure 6.In addition, Start provides some reports that also facilitate the conduction of a SLR.

Related tools
In the literature, there are some tools to support the management of bibliographic references, which are commonly used by researchers to aid in the SLR process.The purpose and the coverage of these tools are different and they are not related to the SLR process proposed by [5], except for SLR Tool [7].
Only SLR Tool [7] focuses on Systematic Literature Review.However, its installation requires the availability of a specific database management system and a pre-configuration of the environment, which can restrict its use, mainly by researchers of other research areas such as Medicine and Nursing, who are also users of the SLR process Another characteristic of the SLR Tool is that it only works with the English and the Spanish versions of the Windows operating system.On the other hand, StArt does not have this restriction and can be easily installed through a wizard interface.Table 1 presents the main characteristics of tools that are being used in the context of literature review.

StArt evaluation: preliminary data on the Usefulness and Ease of Use
According to [8], all proposed technology (method, technique, tool, etc.) should be evaluated before being made available for use.The objective of the evaluation described bellow was to characterize the two aspects of the TAM model (Technology Acceptance Model) [9], to get preliminary data on the tool viability of use.
The evaluation was applied twice.In both occurrences the participants were graduate students in Computer Science (MSc.and PhD.) who had applied the SLR process, manually, during the Research Methodology course.Fourteen students participated of the first evaluation and thirty five participated of the second one.
The evaluation was planned through the GQM (Goal, Question, Metric) paradigm [10][11], which is composed of four steps: Planning, Definition; Data Collection; and Interpretation that are described below.

Planning and Definition
The GQM model constructed for planning the evaluation consists of four goals, thirteen questions (Table 2) and fourteen metrics (Table 3), according to details presented in Figure 7. Based on that model, two questionnaires were used in the evaluation: Questinonnaire1 (Q1 to Q4) for collecting data on student's opinion and their current contact with systematic review; and Questionnaire2 (Q5 to Q13) for characterizing the usefulness and ease of use of the tool, according to TAM.The questions related to TAM were inspired on the study presented in [12] and were evaluated according to the Likert scale [13].Both questionnaires contained blank fields for comments.Table 6 presents the interpretation model of the GQM, which should be read as follows: "If Expression then Interpretation".Taking line 9 as an example where the question is Q4, " If M9 + M10 + M11 ≥ M12 + M13 + M14 then the SR is seen as a key resource for the quality of academic research."It was easy for me to become skilful at using StArt The activity of the greatest difficulty is the creation of the search string 3 M3 > M i , i = 1, 2, 4, 5 and 6 The most difficult activity is to search for articles on search engines 4 M4 > M i, i= 1, 2, 3, 5 and 6 The most difficult activity is to select items that will be read in full 5 M5 > M i, i= 1, 2, 3, 4 and 6 The most difficult activity is to extract information from articles read in full 6 M6 > M i , i= 1, 2, 3, 4 and 5 The most difficult activity is to summarise the results of the systematic review The StArt is easy to use; however, the next step is to analyse the data sent by participants through comments in order to identify improvements needed to facilitate the use of the tool.Once improvements are made, it could be carried out with a new group of participants 12 For Q i, , i=5 to 10: M9 ≤ M10 + M11 and M11 ≥ M10 The StArt does not have the 'ease of use' and so the next step is to study heuristics and usability standards defined in the literature to perform a self-assessment on the tool's interface; the comments submitted by participants should be analysed and, based on that , the StArt project should be reviewed and improvements implemented.The assessment should be conducted again with the same group of participants to see if there was any improvement in the results

13
For Q i, , i=11 to 13: the StArt is useful to perform a systematic review, and the next step is to conduct an experimental study to confirm the result 14 For Q i, , i=11 to 13: M9 ≤ M10 + M11 and M10 ≥ M11 The StArt is useful, but analyse of the data sent by participants through comments must be done in order to identify improvements needed to make the tool useful.Once these improvements made, the same assessment could be carried out with a new group of participants 15 For Q i, , i=11 to 13: M9 ≤ M10 + M11 and M11 ≥ M10 The StArt is not very useful, and therefore the team's next step is to review the design of the tool in order to identify whether there are features that can be implemented to provide greater utility to the StArt.The assessment should be conducted again with the same group of participants to see if there was any improvement in the results

16
For Q i, , i= 5 to 10: The implementation of new features in the StArt must be aborted, the comments submitted by participants should be analysed and the design of the tool must be fully reviewed by the team, especially the user interface.Once the tool is modified, the same study should be conducted until the participants judge the tool easy to use 17 For Q i, , i=11 to 13: The implementation of new features in the StArt must be aborted, the comments submitted by participants should be analysed and the design of the tool must be fully reviewed by the team.
Once the tool is modified, the same study should be conducted until the participants judge the tool useful

Data Collection
The data were collected as follows: firstly, Questionnaire1 was sent by e-mail to the participants who, after answered it, had access to two training videos about StArt and had permission to download the tool.Hence, the students were asked to explore the tool as they have done manually during the course.Secondly, the students who have finished Questionnaire1, received Questionnaire2 that was, sent by e-mail.By using electronic questionnaires, at the end of the evaluation process all the answers were available on spreadsheets, which facilitated the data collection.
The summary of the data collected is showed in the next tables and figures.Tables 5, 6 and 7 present the questions and answers of the first questionnaire and Figures 8, 9, 10 and 11 present charts that show the questions and the answers of the second one.In the charts, the order of the bars obeys the order of the questions.

Interpretation
Applying the interpretation model presented in Table 6 on the data collected and presented in the previous section, we can assume the following about the goals: • G1: in relation to this goal, the results showed that in the first evaluation the protocol filling was selected as the most difficult activity of the SLR process, since six participants (42%) have selected this option (Table 1) and this value makes the expression 1 true (Table 6).However, in the second evaluation the construction of search strings was selected as the most difficult activity of the SLR process, since fifteen participants (42%) have selected this option (Table 1) and this value makes the expression 2 true (Table 6).From now, we are planning to address this issue in a deeper way, aiming to provide facilities that can help the researcher in doing these activities.
• G2: in relation to this goal, the results showed that the participants change their behaviour for conducting literature review.Although most of the participants have not tried to apply the SLR process after the course, they consider the SLR a key for the quality of academic research.This conclusion is supported by the following expressions (Table 6): o expression 7: Q2: M7 ≥ M8, that is true for the evaluation 1 (13≥ 1) and for the evaluation 2 (32 ≥ 3), according to the values of the Table 2; o expression 8: Q3: M7 ≥ M8, that is false for the evaluations 1 (6 ≥ 8) and for the evaluation 2 (12 ≥ 23), according to the values of the Table 2; o expression 9: Q4:M9 + M10 + M11 ≥ M12 + M13 + M14, that is true for the evaluation 1 (7+3+4 ≥ 0+0+0) and for the evaluation 2 (21+6+8 ≥ 0+0+0), according to the values of the Table 3.
• G3: in relation to this goal, there are four expressions in the interpretation model (Table 6): 10, 11, 12 and 16.
The results showed that in both the evaluations, most of the participants agree with the ease of use of the StArt.
In the evaluation 1 the answers of the participants were concentred in "quite agree" (Figure 8).Considering the answers for questions 5 up to 10, the expression 11: Qi, i=5 to 10: M9 ≤ M10 + M11 and M10 ≥ M11 is true, since 32 ≤ 34 + 12 and 34 ≥ 12. Hence, the next step should be the analysis of the comments submitted by the participants in order to identify improvements needed to facilitate the use of the tool.Once the improvements are made, the evaluation could be carried out with a new group of participants.
In the evaluation 2 the answers of the participants were concentred in "extremely agree" (Figure 9).Considering the answers for questions 5 up to 10, the expression 10: Qi, i = 5 to 10: M9 ≥ M10 + M11 is true, since 103 ≥ 70 + 32.Therefore, the development team should conduct an experimental study with a new group of participants to test the result of the evaluation 2.
• G4: in relation to this goal, there are four expressions in the interpretation model (Table 6): 13, 14, 15 and 17.The results showed that in both the evaluations, the majority of the participants agree with the usefulness of the StArt (Figures 10 and 11).
According to the interpretation model (Table 6) the next step is to conduct an experimental study to confirm this result.

Final remarks and future work
This paper presented the StArt tool that supports the conduction of the systematic literature review process [14] providing facilities for minimizing this laborious process.It has been developed in an iterative and interactive way, with constant feedback from users.For directing the next steps, an evaluation was carried out twice, aiming to explore the support of the tool for conducting all the stages of the SLR process.This evaluation involved students who had already applied the SLR process manually.The evaluation was planned using the GQM model and established four goals: two of them, related to the aspects addressed by the Technology Acceptance Model (TAM) -ease of use and usefulness; one related to the identification of the activity considered the most difficult among the SLR activities; and another one related to the investigation on user's behaviour change in conducting literature review.
The use of the TAM has turned the evaluation quick and objective, and disseminated the model among the participants.The use of the GQM led objectivity to the evaluation and to the definition and elaboration of forms for data collection.One of the limitations of the evaluation is the number of participants, which does not allow generalizing the results.
As our main objective was to explore the TAM aspects, in relation to these issues, the evaluation indicated that the StArt is useful, since 71.42% of the participants of the evaluation 1and 81.90% of the participants of the evaluation 2 extremely agreed with the usefulness of the tool.For the ease of use, in the evaluation 1the answers were concentrated on quite agree (45.23%) and extremely agree (38.09%) and in the evaluation 2 the answers were concentrated on extremely agree (49.04%) and quite agree (33.33%).
According to the GQM interpretation model, the actions that should be taken are: conducting an experimental study to confirm the results of this evaluation and analyse the qualitative data sent by the participants in order to identify improvements needed to facilitate the use of the tool.In summary, the evaluation has provided evidence that the tool will be accepted by users to support the conduction of SLRs.

Figure 1 .
Figure 1.Part of the Protocol highlighting the source list that is added dynamically to the side-tree

Figure 2 .
Figure 2. Information available in the StArt when the studies are uploaded into the tool

Figure 3 .
Figure 3. Application of the Inclusion and exclusion criteria

Figure 4 .
Figure 4. General data of each study

Figure 6 .
Figure 6.Some final data provided at Summarization stage

7
For Q2: M7 ≥ M8 Changes have occurred in how to conduct literature review after learning the process of systematic review 8 For Q3: M7 ≥ M8 Most of the students conducted another SLR after the course 9 For Q4: M9 + M10 + M11 ≥ M12 + M13 + M14 SR is seen as a key resource for the quality of an academic research 10 For Q i , i = 5 to 10: M9 ≥ M10 + M11 The StArt is easy to use, and the next step is to conduct an experimental study with a new group of participants to test the result 11 For Q i, , i=5 to 10: M9 ≤ M10 + M11 and M10 ≥ M11

Table 1 .
Characterization of related tools

Table 3 .
Metrics used in the GQM

Table 4 .
Model interpretation of the GQM

Table 6 .
Data collected in questionnaire 1 (questions 2 and 3) Q2) After knowing the process of systematic review, have you changed the way you perform literature review?