Using git metrics to measure students’ and teams’ code contributions in software development projects

Many software engineering courses are centered around team-based project development. Analyzing the source code contributions during the projects’ development could provide both instructors and students with constant feedback to identify common trends and be-haviors that can be improved during the courses. Evaluating course projects is a challenge due to the diﬃculty of measuring individual student contributions versus team contributions during the development. The adoption of distributed version control systems like git enables the measurement of students’ and teams’ contributions to the project. In this work, we analyze the contributions within eight software development projects, with 150 students in total, from undergraduate courses that used project-based learning. We generate visualizations of aggregated git metrics using inequality measures and the contribution per module, oﬀering insights into the practices and processes followed by students and teams throughout the project development. This approach allowed us to identify inequality among students’ contributions, the modules where students contributed, development processes with a non-steady pace, and integration practices rendering a useful feedback tool for instructors and students during the project’s development. Further studies can be conducted to assess the quality, complexity, and ownership of the contributions by analyzing software artifacts. . 26 for churn. Team 3 had a low inequality for commits and merges with a moderate inequality for churn. Team inequality was moderate. The rest of the ratios exhibit high inequality. The 50 / 50 ranges between 1 . 32 and . 86 commits, 88 6 merges, and . 75 . 9 churn. Team 3’s contributions had low inequality, the commits for team 1 and the merges for team 2 had moderate inequality, and the rest of the contributions had high inequality.


Introduction
Software engineering is a technical activity that also requires soft skills such as teamwork, initiative, commitment, communication, and time management [1]. Although these skills are essential for software engineering professionals, there is a gap between what is taught in academia and what is needed in the industry [2]. Hence a paradigm shift has emerged in education to help students acquire the knowledge and skills needed in the industry through project-based learning [3]. Nonetheless, evaluating such projects is a challenge for instructors due to the difficulty of objectively measuring individual students' contributions against team contributions because the projects are a product of collaborative effort [4][5][6]. To evaluate students fairly, instructors require metrics to assess the contributions of students.
Given the relevance of teamwork as a key skill in the industry, the adoption of collaboration tools has become essential [7]. One such tool is git, a distributed version control system that facilitates collaborative code writing [8]. Git is open-source and widely used in industry as well as educational settings [9].
The adoption of git in education [10][11][12][13][14] offers the opportunity to gain insights into the amount of work contributed to the project by each team member (e.g., source code added to the repository), as well as behavior patterns (e.g., software engineering practices and the team dynamics), through metrics visualizations. We believe that an assessment based on quantitative git metrics could serve as a helpful feedback tool (and fairer grading scheme) for both instructors and students, providing objective real-time measures of contributions to a development project. Continuous assessment of students' and teams' performance in development projects could ensure active participation from the students and teams during the project course [9,15].
In previous studies, researchers have assessed the distribution of contributions in open-source software and private projects, focusing on analyzing the Pareto principle [16][17][18]. Studies in academic contexts have focused on extracting metrics such as commits, merges, lines of code, number of modified files, types of modified files, time spent, and component and developer entropy [9,[12][13][14].
Our goal is to propose an approach for measuring and visualizing students' contributions in software development projects. The metrics extracted from git are commits, merges, and churns, whereas the aggregate measures are inequality indexes and inter-decile ratios. We also extract the daily commits and merges to analyze the contribution frequency. This approach was used to analyze eight undergraduate software development projects with 150 students and 31 collaborative teams. In this work, we extend our previous research [19] enhancing the detail of our approach's analysis, including in the evaluation of three new projects, inspecting the contributions by module, and examining integration practices. We defined the following research questions: RQ1. How does contribution vary across different software development projects and students?
This question allows us to determine how the contributions are distributed for students and their teams, indicating the degree of inequality of these contributions.
RQ2. How does contribution within software development projects evolve? This question permits us to identify trends in the evolution of the contribution concerning the frequency over time for students and teams.
The remainder of the paper is organized as follows: Section 2 summarizes relevant related work, Section 3 describes the methodology, Section 4 presents the results, Section 5 details the implications of our results. Section 6 outlines our conclusions and future work.

Related work
Here we present previous works that investigate git repositories for contributions in educational contexts with team-based projects.
Cochez et al. [12] analyzed git usage in several educational projects examining the commit log data, with teams ranging from two to five students. They tracked the commit activity in each class session, and during the entire course the quality of commit messages, the difference between student's participation and files committed. They found that most students have regular commits in sessions with more infrequent commits outside of sessions, a high amount of gibberish in the commit messages, a considerable amount of inequality in the teams, and the amount of garbage is related to students' awareness of good development practices. Their findings suggest that without proper adoption, the use of versioning systems is limited to a submission assignment tool while leaving out the benefits of collaborative and distributed work.
Mittal and Ashish [13] mined wiki-based document management systems, version control systems, and issue tracking systems to gain insights into the practices and procedures, in an educational setting, with fivemember teams. From the Wiki, the authors extracted the quality of the messages, consistency of activity, and uniformity of contributions within the team. Using the version control systems, they analyzed the component and developer entropy and the commits' behavior near deadlines and from the issue tracking systems, they examined trends for bug creation and fixing, mean time to repair, defective components, and process compliance. They found varying degrees of equality among team members, consistency in the activity, quality, and process compliance.
Kertész [14] adopted Github as a teaching tool and collaborative platform for course assignments to enhance critical thinking and collaborative learning. Students used the repositories to check out materials and solve their assignments with examples. The code frequency and quality of assignments were investigated. They found that collaborative learning helped improve the course as it enhanced students' skills and provided an inherent advantage in group working skills.
Parizi et al. [9] proposed an architectural model to evaluate contributions in students' projects in an objective manner. Their model comprises the number of commits, number of merge pull requests, number of files, total lines of code, and time spent per day. Additionally, they proposed to take into account the difficulty of the problem assigned to a developer for a more accurate assessment of effort. They believe that the conducted assessment can help more productive learning preparing graduates better for the industry.
Raibulet and Arcelli [20] described their experience of enhancing collaboration and teamwork activities with different software tools in a university course. The tools used were Github for collaborative software development, Sonarqube for software quality analysis, and Microsoft project for project management. Feedback was recollected from students with questionnaires of the advantages and disadvantages of using such tools and found that students considered the tools useful. They recommend using such tools, as students were enthusiastic about learning new approaches and tools, stimulating the learning process.
Gustavsson and Brohede [15] created a tool that continuously assesses the source code and interactions from Github repositories. They collected the files, commits, students, and issues (merges) and extracted the number of commits, merges, lines of code, events, and comments, using the metrics to rank students. They found that using the tool was useful to assess the individual contribution and the course outcome.
Tushev et al. [21] investigated how Github as a platform affects students' team dynamics and performance in large software engineering classrooms, including the benefits, challenges, and drawbacks of its use. They conducted entry and exit surveys, as well as an exit interview. They analyzed students' behavior based on the number of commits and quality of the commit messages. They found that the use of Github does not affect the quality of the work, but the metrics analyzed could not be used as a reliable proxy for team performance assessment. Furthermore, the social features of Github had several limitations.
Buffardi [6] evaluated the relationships between qualitative and quantitative assessments. For the qualitative assessment, they analyzed peer-reviewed evaluations, project scores, and quiz scores. Whereas, for the qualitative evaluation, they measured the number of commits, lines of code changes, user stories completed, and user points completed are used for the quantitative assessment. They found a positive correlation between the peer-reviewed evaluation of interactions and contributions. Furthermore, there was a lack of strong association between the qualitative and quantitative assessments. The use of quantitative metrics as a complement to traditional assessment is recommended.
Regarding the inequality measures, they have been previously used and analyzed in open source software projects. Goeminne and Mens [16] have used the inequality indexes (Hoover, Gini, and Theil index) to analyze the distribution of activity in open source projects. They analyzed the distribution of contributions for the number of commits, mails, new bug report submissions, comments added to existing bug reports, and changes to existing bug reports mining the data from version repositories, mailing lists, and bug trackers. They found a highly imbalanced distribution of activity across all contributions. Meanwhile, in the field of econometrics, other inequality metrics have been proposed, such as the inter-decile ratios [22].
The main contribution of our work is a measuring and visualization approach of students' contributions from git metrics, using aggregated inequality indexes and inter-decile ratios adopted from the field of econometrics. Our analysis shows contributions of students, contributions of teams, contributions of students within their teams, contributions to modules, and contributions through time.

Methodology
This section describes our approach and the process followed to calculate contributions, the metric extraction process, as well as the git repositories and course descriptions.

Measuring Approach and Process
Our approach was designed from the point of view of the instructors. Fig. 1 shows the objective, questions, and metrics used in our analysis [23]. The goal of the instructors was to measure student's contributions. Hence, the following questions were asked: How can we measure contributions? How can we measure contribution inequality? How can we measure contributions over time? The first question helps us identify metrics that represent contribution. The second question allows us to determine whether a project's contribution is equal or unequal, providing information about high and low contributors and team dynamics. The third question enables us to discover patterns of contributions over time. Each of these metrics will be analyzed for the student within the project, the team within the project, and the student within their team.  The approach used to extract the metrics is shown in Fig. 2. There are three main steps: parsing the students' identification, extracting the git metrics, and calculating the metrics from econometrics. First and foremost, the git repositories must be selected. In our case, we selected eight undergraduate software development repositories. Then we proceeded with parsing, which begins with grouping student's identifications and removing outliers. Account identifications were grouped up to avoid fragmenting the results using mailmaps 1 . Also, students who abandoned the course were excluded. Then, git metrics (including the metrics per day) for each student were extracted. The modules that each student contributed to were mined using the file path of the commits. The results per student were grouped up in their corresponding team to obtain the team's contribution and were divided for each team resulting in the data for the students within their teams. Finally, the inequality indexes and inter-decile ratios were calculated for each data set. With the different metrics results, several visualizations, such as heat maps and Pareto charts, were plotted to analyze the contributions throughout the development process.

Repository
Commit Used commits

Metrics
The following metrics were extracted from git repositories: • Commits: The number of commits a student pushed to the repository. To obtain this, the git log 2 command was used with the options --use-mailmap, --author, --all and --no-merges. Then, each line indicates a commit that is counted per developer. Commits do not include merges.
• Merges: The number of merges a student has integrated into the repository. This is obtained using the git log command with the options --use-mailmap, --author, --all and --merges.
• Churn: The lines of code added and deleted by a student, including comments. To obtain this, the git log command was used with the options --use-mailmap, --author, --numstat, --all, and --pretty=tformat. Then, the added and removed lines of code were added.
• Daily commits or merges: The amount of commits or merges a student contributes per day. It was obtained by the git log command with the options --use-mailmap, --author, --date=iso8601, --pretty=format:%cd and --all. We added the option --merges for merges and --no-merges for commits. The number of times a date appears is counted and used as the metric.
We only considered date intervals according to projects' sprints and the courses' total duration, using the git options --since and --until.
Then, we proceeded to calculate the aggregate inequality metrics. These inequality metrics come from the econometrics field, where they are used to measure income distribution. However, they have been applied in other settings to measure distributions [24]. In our study, we used them to analyze the distribution of student contributions in projects. The following inequality metrics were calculated: • Inequality indexes: We used the Hoover index, Gini index, and Theil index, as defined by Goeminne and Mens [16]. The value of these measures ranges from 0 to 1, with 0 indicating perfect equality and 1 perfect inequality. They were calculated with R 3 and the packages REAT 4 for Hoover index, and ineq 5 for Gini and Theil indexes.
• Inter-decile ratios: We used the ratio of the population richest over the poorest to analyze the inequality in the tails of the population, as defined by Summer et al. [22], indicating how many times the richest are wealthier than the poorest. In our case, wealth is measured in terms of contributions.
We used the 20/20 ratio (contributions of the 20% top contributors versus those of the 20% bottom contributors) to give insight into the disparity between the top and bottom contributors. Additionally, we used the Palma ratio (10/40) [22] because inequality changes in income are affected mostly by these populations; hence, we are interested in analyzing this metric in software development. Finally, the 50/50 ratio was chosen to analyze the top half vs. the bottom half contributors. This ratio differs from previous ones as it includes the entire population and not just the tails.

Courses and git repositories
Our analysis includes eight software engineering projects, with a total of 150 students and 31 collaborative teams, developed in undergraduate courses at the Universidad de Costa Rica. The courses (and their projects) lasted one semester each (equivalent to 4 months of software development). The git repositories used in this study were from the following courses: In these courses, all students worked on a single shared project in collaborative teams (scrums of scrums 6 ). Each team was in charge of a different module or theme with shared features, all of which had to be integrated into the repository. Projects were developed using agile methodologies: mostly Scrum with some added practices from Extreme Programming (XP). The main methodological difference between the two courses was the sprint 0 in SE, where students focused on the conception and general design rather than functional software (non-existent in M courses). As not all students had the knowledge required to use Git-based repositories, they were taught how to use this technology. Details of student projects are shown in Table 1. In total, there were 31 teams, with most projects having four teams. There were in total 9221 commits, 4490 merges and 10, 062, 406 churn. For mobile projects, the averages are 412.67 commits, 123.33 merges, and 102, 677.67 churn. In the case of software engineering projects, the averages were 1596.6 commits, 824 merges, and 1, 950, 874.6 churn.

Results
In this section, we present the results. Using the proposed approach, we answer each research question, along the following dimensions: student contribution to the project, team contribution to the project, and student contribution to the team. Tables 2, 3, and 4 show the statistics of the students contributions to the project. Fig. 3 shows the students' contributions to projects, where each subfigure represents a project and contains three histograms (for commits, merges, and churn). In each histogram, the horizontal axis contains 30 intervals of contributions, and the vertical axis displays the number of students whose amount of contributions fall within each horizontal interval. These results show that students contributed unequally to projects.

Student contribution to the project
We will analyze the projects per course as direct comparisons between courses cannot be made due to their technological differences. Results for the projects M1, M2, and M3 show significant differences in the number of contributions for the projects. The difference in the total quantity of contributions between M2, and M1 and M3 was because project M2 had fewer students. Project M1 had half the commits and merges of project M3, while project M3 had half the churn of project M1. This may indicate that project     M1 delivered more functionality while project M3 had a more stable process. The projects show a rightskewed distribution of contributions, except for the commits of M2. Inequality of contributions was more evident in merges: 7 students out of 28 for projects M1 and M2 made zero merges, meaning that 25% of students did not participate in integration efforts. Another key inequality indicator was the difference between the highest and lowest contributors. M1, M2, and M3 ratio of contribution between the top and bottom contributors for commits were 61, 35 and 6.92 respectively, for merges were N A (no contribution from the bottom contributor), N A and 9 respectively, and for churn was 439.37, 207.224 and 4.77 respectively. Considerable variations in contribution were found across all projects and contribution types being at least 13.44 for commits, 3.03 for merges and 2, 110.16 for churn. Only in SE2 and SE5, there was a student who was the top contributor across all the contributions, while only in SE2 there was a student who was the bottom contributor for all the contributions. Differences in contribution do not necessarily imply differences in effort, as there might be students with fewer contributions but implement code with greater difficulty in each contribution. Therefore, it is essential to take into consideration other aspects of the contributions.
On the other hand, results for SE1, SE2, SE3, SE4, and SE5 projects had at least twice more contributions than M1, M2, and M3. Higher churn contributions in SE projects can be attributed to development frameworks generating code automatically. While for commits and merges, the higher contribution might be because SE is a required course and M is an elective course, hence students put more effort into the course. Moreover, most distributions of contributions were right-skewed, except for commits and merges in SE3, which were close to a uniform distribution. SE1, SE2, SE3, SE4, and SE5 ratio of contribution between the top and bottom contributors for commits were 11.72, 10.06, 4.88, 6.46 and 9.93 respectively, for merges were 6, 12.86, 9.11, 4.18 Fig. 4 shows the Venn diagram for commit and merges contributions per module. In mobile projects, the modules were the code, the user interface (UI), and the tests. For the SE projects, the modules were the models, controllers, views, unit, integration and user interface tests (tests), and database (DB). In mobile projects, M2 and M3 had different behavior than M1. In M2 and M3, students mostly contributed to code, followed by the UI and then tests. For both projects, most contributions involved both the code and the UI. Meanwhile, M1 still had code as the biggest module, followed by UI and in last place tests. However, tests were only performed in changes that also modify the code, which may be because code was not testable and required changes. For SE projects, the most contributed modules across all projects were the views followed by the controllers. After these modules, models were more contributed than the DB in projects SE2, SE4 and SE5, while for SE1 and SE3 they contribute more to the DB than models. The least contributed module was the tests for all the projects. In all these projects, changes to only the views module were the most frequent changes. The number of cross-module changes was high, indicating that the commits and merges tend to involve multiple modules. For changes to multiple modules, the most frequent ones for SE1 and SE3 were changes to all modules, controllers and views, and DB and models respectively, indicating a high prevalence of changes modifying multiple modules with an emphasis on similar modules. In SE2, the most frequently changed modules were the DB and models, all modules, and the controllers, views and models. Finally, for SE4 and SE5 the order was controllers and views, DB and models, and all models indicating fewer cross-module changes in general than other projects.
Inequality indexes for commits, merges, and churn is shown in Table 5. The indexes are classified as low inequality if it is less than 0.34, moderate if it is more than or equal to 0.34 and less than 0.67, and high if it is more than or equal to 0.67. A value of 0 indicates perfect equality. Overall, results show that generally there was low to moderate inequality for commits and merges, and moderate to high inequality for churn (values close to 1 mean higher inequality). Mainly, SE projects show low inequality for commits and merges, indicating that students had a similar total number of commits and merges by the end of the course. This could be partially explained by the fact that in these projects development practices such as continuous integration and steady pace were emphasized. Yet, for churn, inequality was moderate to high. For mobile projects, commits and merges had a low to moderate level of inequality. Churn had a moderate to high inequality for M1 and M2, while M3 had low inequality. This low inequality might be due to the development practices emphasized and having the least amount of churn contributed leading to less variation between students. The high churn inequality exhibited across courses demonstrates that inequality was not dependent on development tools.
The inter-decile ratios for commits, merges and churn by the project are shown in Fig. 5. Bars represent the ratio scaled to a percentage, with green and blue representing bottom and top contributors, respectively. The number at the right of each bar corresponds to the ratio. Results show that the tails of contribution distributions were considerably unequal. The inequality for the 20/20 and 50/50 ratio is classified as low if it is less than or equal to 2, moderate if it is more than 2 and less than or equal to 3, and high if it is more Figure 4: Venn diagrams of the modules contributed to by students in commits and merges. The 20/20 ratio commits ranged between 3.16 and 34.5, merges between 2.84 and N A (bottom contributors did not contribute at all), and churn between 3.48 and 151.87. The projects exhibit high inequality across all contribution types, except for merges in SE1 and SE4 that had moderate inequality. Churn had a higher inequality in the tails than commits and merges as all projects except M3 had a churn inter-decile ratio higher than 14.96, indicating high inequality. The Palma ratio commits ranged between 0.6 and 2.13, merges between 0.77 and 7.75, and churn between 1 and 55.69. M3, SE3 and SE4 for both commits and merges, and SE1 for merges had values under 1, indicating low inequality between the top 10% and the bottom 40%. Merges for M1 were highly unequal as the ratio was 7.75. For churn, M3 had low inequality and the rest of the projects had high inequality ranging from 3.56 to 55.69. Lastly, for the 50/50 ratio, commits ranged between 1.79 and 3.93, merges between 1.82 and 9.62, and churn between 2.07 and 42.56. Commits and merges had low to moderate inequality for M3, SE1, SE2, SE3, and SE4, with values ranging from 1.79 to 2.28. While M1, M2, and SE5 had high inequality in commits and merges ranging from 3.03 to 9.62. The churn ratio for all projects except M3 was over 4.96 indicating high inequality. The churn inequality for M3 was moderate. The results, therefore, suggest that the inequality between the top and bottom contributors showed significant differences between the top and bottom contributors. Only M3 had atypical low inequality across all ratios. This low inequality may be due to the development practices emphasized in the course.

Team contribution to the project
The contributions of teams to projects are shown in Fig. 6 for each project and contain three Pareto charts (for commits, merges, and churn). In each chart, the horizontal axis displays the teams in decreasing order of contributions, the left vertical axis shows the frequency of contributions, and the right vertical axis represents the cumulative percentage of contributions. Overall, these results show high inequality in churn contributions, with moderate inequality in commits and merges. Top contributing teams vary across contribution types. M1 and M2 had high inequality for churn, and moderate to high inequality for commits and merges. Meanwhile, M3 had low inequality across all contributions. M1's highest and lowest contributing teams yield 150 and 47 commits, 52 and 8 merges, and 57, 421 and 27, 394 churn, respectively. M2's highest and lowest contributing teams yield 81 and 30 commits, 16 and 6 merges, and 45, 985 and 11, 383 churn, respectively. M3's highest and lowest contributing teams yield 198 and 154 commits, 86 and 52 merges, and 29, 246 and 15, 233 churn, respectively. The cumulative percentage of contributions also evidences high inequality. Two of the four teams for M1 contributed with 75% of commits, 80% of merges, and 66% of churn. In M2, one of three teams contributed 53% of commits, 55% of merges, and 62% of churn. Lastly, for M3 two of four teams contributed 54% of commits, 57% of merges, and 63% of churn.
For SE projects, the inequality for commits and merges was moderate, but the churn inequality was high. The highest and lowest contributing teams for SE1 yield 688 and 371 commits, 286 and 190 merges, and 930, 438 and 126, 044 churn, respectively. SE2's highest and lowest contributing teams yield 395 and 167 commits, 238 and 84 merges, and 182, 134 and 54, 482 churn, respectively. SE3's highest and lowest contributing teams yield 465 and 292 commits, 310 and 150 merges, and 1, 011, 330 and 168.457 churn, respectively. The highest and lowest contributing teams for SE4 yield 574 and 234 commits, 291 and 92 merges, and 3, 566, 917 and 115, 479 churn, respectively. Lastly, the highest and lowest contributing teams for SE5 yield 829 and 229 commits, 463 and 183 merges, and 529, 968 and 32, 420 churn, respectively. SE1 project's cumulative percentage shows that two of the five teams produced 49% of commits, 46% of merges, and 77% of churn. In SE2, two of the four teams contributed 66% of commits, 68% of merges, and 70% of churn. For SE3, two of the four teams generate 59% of commits, 62% of merges, and 78% of churn. Two of the four teams for SE4 contributed with 62% of commits, 66% of merges, and 93% of churn. Lastly, one of the three teams for SE5 contributed with 59% of commits, 55% of merges, and 80% of churn. Only SE5 had the same team as the highest contributor across all contribution types. Table 6 contains the inequality indexes of team contributions to projects. These results show that inequality among teams was low to moderate. For SE projects, inequality indexes indicate low inequality for commits and merges. The churn inequality for SE2 and SE3 was low, while for SE1, SE4, and SE5 was low to moderate. Hence, there was low to moderate inequality among teams' contributions. All mobile projects had low inequality across all contributions; particularly, M3 had a very low inequality. M2 had higher inequality indexes than M1 for commits and merges, but lower inequality indexes for churn. The indexes of team contributions to projects were less than the indexes of student contributions to the projects. This may indicate that top and bottom contributors were distributed across teams. The inter-decile ratios of team contributions per project are shown in Fig. 7. The inequality in team contributions was lower than in student contributions, yet there was considerable inequality among teams. The 20/20 ratio commits ranged between 1.29 and 3.62, merges between 1.51 and 6.5, and churn between 2.1 and 30.89. Commits and merges for M1, commits for SE5 and merges for SE4 indicated high inequality, while the other projects' commits and merges indicated low to moderate inequality. The churn inequality for M1 and M3 and SE2 was low to moderate ranging from 2.1 to 2.3, while for M2, SE1, SE2, SE3, SE4, and SE5 it was high, being at least 3.34. The Palma ratio commits ranged between 0.61 and 1.58, merges between 0.77 and 3.06, and churn between 1.04 and 12.45. Projects M3, SE1, SE3, and SE4 had low inequality for commits. The merges inequality for ME3 and SE1 was low, for SE1 was high, and for the rest of the projects was moderate. Churn's inequality was high for SE1, SE4 and SE5, and low for the other projects. As the Palma ratio measures populations that were almost 0 and requires a complete team to calculate, the measure was analyzing the top 20 − 33% with the bottom 40 − 67% of the projects. The 50/50 ratio commits ranged between 1.18 and 2.98, merges between 1.19 and 3.88, and churn between 1.7 and 13.38. M1 and SE5 had a moderate inequality for commits, while the other projects had a low commit inequality. The merge inequality was high for M1, moderate for SE2, and low for the other projects. Churn had low inequality for M1 and M3, moderate inequality for M2 and SE2, and high inequality for SE1, SE3, SE4 and SE5. As most team contribution ratios were lower than the corresponding student contribution ratios, this may corroborate that top and bottom performers were distributed across teams.

Student contribution to the team
The contributions of students within their teams for the projects are shown in Fig. 8. Overall, we found varying degrees of inequality within teams. M1 minimum contributions ranged from 1 to 20 commits, 0 to 9 merges, and 73 to 3, 928 churn, while the maximum contributions ranged from 19 to 61 commits, 3 to 18 merges, and 13, 278 to 32, 074 churn. Team 4 minimum contributor contributed more in commits than the maximum contributor for teams 1 and 2, and contributed more in merges than the maximum contributor of all the other teams. Hence, a student being a bottom contributor in their team does not necessarily indicate them being a bottom contributor in the project. In the teams with five students, the two students who contributed the most in each contribution had a cumulative percentage ranging from 63% to 68% of the commits, 56% to 100% of the merges, and 65% to 88% of the churn. This high cumulative percentage for merges in teams 2 and 3 was due to three and two students, respectively, not merging in their team. In the team with four students, two students contributed 68% of the commits, 60% of the merges, and 79% of the churn. The top contributor was the same across all contributions for two teams and was the same for two types of contributions for two teams. The bottom contributor was the same for all contributions in two teams and was the same for two contributions in two teams. Furthermore, in team 3 the student with the highest commits and churn was also one of the lowest contributors in merges indicating that being a top contributor in a metric does not necessarily indicate being a high performer across all contributions.
For M2, minimum contributions ranged from 1 to 19 commits, 0 to 1 merges, and 219 to 2, 614 churn, while the maximum contributions ranged from 20 to 35 commits, 5 to 8 merges, and 9, 031 to 45, 382 churn. The cumulative percentage of contributions had one student per team contributing from 42% to 83% of the commits, 50% to 83% of the merges, and 55% to 99% of the churn. Though teams 1 and 2 had high inequality in the contributions within the team (with contributions for the top developer being at least 67%), team 3 had more equal contributions ranging from 42% to 55%. Therefore, the inequality in contributions was different for each team. The top contributor was the same across all contributions for two teams and was the same for two types of contributions in one team. The bottom contributor was the same student across all contributions for all of the teams.
M3 minimum contributions ranged from 13 to 21 commits, 3 to 10 merges, and 1, 890 to 3, 226 churn, while the maximum contributions range from 46 to 90 commits, 15 to 27 merges, and 3, 006 to 9, 010 churn. Teams with five students cumulative percentage had two students contributing between 51% to 53% for commits, 52% to 61% for merges, and 46% to 61% for churn. The team with four had two students' cumulative percentage as 71% for commits, 72% for merges, and 70% for churn. The top contributor for all contributions was the same in two teams, was the same for two contributions in one team, and was different for all contributions in one team. The bottom contributor for all contributions was the same in two teams, was the same for two contributions in one team, and was different for all contributions in one team. The team with different students as the bottom contributor across the contribution types was not the same team with different students as top contributors across the contribution types; hence, having distributed bottom contributors does not necessarily indicate having distributed top contributors.
Regarding SE projects, SE1 minimum contributions ranged from 32 to 70 commits, 20 to 33 merges, and 2, 744 to 10, 227 churn. The difference in minimum commit contributions for team 1 from the other teams was substantial as it was 70 for team 1, but ranged from 32 to 41 in other teams. Furthermore, a similar pattern was found in the minimum churn contributions for teams 1 and 5, ranging from 10, 122 to 10, 227, with the other teams, ranging from 2, 744 to 3, 912. Therefore, the contributions of bottom contributors in teams may not always be similar. The maximum contributions range from 135 to 375 commits, 50 to 120 merges, and 65, 195 to 542, 473 churn. The cumulative percentage of contributions had two students out of five contributing 55% to 67% of the commits, 50% to 60% of the merges, and 60% to 88% of the churn. The churn contributions for team 5 were substantially lower for team 5, being 60%, with the other teams ranging from 84% to 88%, hence churn was much more similar in team 5 than other teams. The top contributor for all contributions was the same in one team, for two contributions was the same in two teams, and was   different for all contributions in two teams. The bottom contributor was the same across all contribution types for one team, the bottom contributor was the same for two types of contributions for two teams, and the bottom contributor was different for each contribution type for two teams. The team that only had one student as the top contributor across all contributions was also one of the teams with distributed bottom contributors. On the other hand, one of the teams with distributed top contributors was also the only team with the same student as the bottom contributor for all contribution types. Therefore, distributed top or bottom contributors does not indicate that the other top or bottom contributor may also be distributed. In SE2, minimum contributions ranged from 16 to 32 commits, 7 to 17 merges, and 3, 519 to 5, 324 churn, while the maximum contributions ranged from 58 to 161 commits, 28 to 90 merges, and 18, 807 to 87, 595 churn. In the team with seven students, two students contributed 54% of the commits, 52% of the merges, and 63% of the churn. In the teams with five students, two students' contributions ranged from 50% to 57% of the commits, 52% to 56% of the merges, and 53% to 62% of the churn. Finally, in the team with four students, two students contributed 75% of the commits, 69% of the merges, and 93% of the churn. The top contributor was the same across all contribution types for two teams and was the same for two types of contributions for two teams. The bottom contributor was the same across all contributions for one team and was the same for two types of contributions in three teams. We also found that the teams with only one student as the top contributor of all the contributions were not the same teams with only one student as the bottom contributor.
SE3 minimum contributions ranged from 24 to 48 commits, 9 to 31 merges, and 1, 394 to 18, 899 churn, while the maximum contributions ranged from 94 to 117 commits, 51 to 82 merges, and 123, 887 to 742, 620 churn. The cumulative percentage of contributions for two students in teams of four ranges from 62% to 64% of the commits, 61% to 73% of the merges, and 87% to 93% of the churn. In teams of six, two students' contributions ranged from 41% to 52% of the commits, 46% to 47% of the merges, and 67% to 75% of the churn. The churn seems to be substantially lower in teams with six team members than teams with four team members. The top contributor in three teams was the same student for two types of contributions, and was different for the contribution types for one team. The bottom contributor was the same for all contributions in three teams and was the same for two contributions in one team.
For SE4, minimum contributions ranged from 26 to 46 commits, 17 to 35 merges, and 2, 421 to 17, 018 churn, while the maximum contributions ranged from 89 to 168 commits, 32 to 71 merges, and 46, 022 to 3, 494, 945 churn. The minimum contributor for team 3 outperforms in merges the top contributor of team 2. In teams of five, two students' contributions ranged from 57% to 64% for the commits, 48% to 54% for the merges, and 54% to 89% for the churn. In teams of six, two students contributted from 44% to 52% of the commits, 42% to 53% of the merges, and 88% to 99% of the churn. The top contributor was the same across all contributions for one team, was the same for two contributions for two teams, and was different for each contribution for one team. The bottom contributor was the same for all contributions in two teams, was the same for two contributions in one team, and was different for all contributions in one team. The team with the distributed top contributors was the same team with the distributed bottom contributors, while the team with only one student as the top contributor was one of the teams with only one student being the bottom contributor.
Lastly, in SE5 minimum contributions ranged from 27 to 111 commits, 21 to 54 merges, and 1, 717 to 7, 307 churn, while the maximum contributions ranged from 66 to 268 commits, 53 to 153 merges, and 17, 511 to 391, 053 churn. Team 3's minimum contributor for commits and merges outperforms the maximum contributor of team 1. In the six-member team, two students contributed to 47% of the commits, 46% of the merges, and 63% of the churn. The five-member teams had two students' contributions ranging from 54% to 65% of the commits, 53% to 60% of the merges, and 72% to 94% of the churn. The top contributor was the same across all contributions for one team and was the same for two contributions for two teams. The bottom contributor was the same for two contributions in two teams and was different for all contributions in one team.
Inequality indexes of student contributions within teams for the projects are shown in Table 7. Results show that inequality among team members' contributions was low to moderate, suggesting that top and bottom performers were distributed across teams.
Regarding the mobile projects, the behavior of projects M1 and M2 were somewhat similar, while project M3 was considerably different. For projects M1 and M2, commits had low to moderate inequality; merges had low, moderate, and high inequality; and churn had low, moderate, and high inequality. Commits and merges tend to had low inequality for M1 and moderate inequality for M2, while churn tends to had moderate inequality for both projects. The only team for these projects that had low inequality across all metrics was team 3 of M2. M3 had, for all of the metrics, low inequality in the teams. Hence, inequality among teams varies greatly: some teams had similar contributions per team member, while others vastly differ in their team members' contributions. Regarding SE projects, commits and merges had low inequality for all Table 7: Inequality indexes of student contributions within teams.

Project
Team Commits Merges Churn Hoover Gini Theil Hoover Gini Theil Hoover Gini Theil teams except SE1's commits for team 1. Churn had low, moderate, and high inequality, with a tendency for most teams of moderate inequality. Still, team 5 of SE1, teams 2 and 4 of SE2, and team 1 of SE4 had low inequality in churn for all metrics. In SE4 and SE5, four out of 7 teams had a churn metric that indicates high inequality. Hence, SE projects had low inequality in general, while the churn shows no clear tendency for the teams with a large disparity in churn, and teams with more uniform churn. The inter-decile ratios of student contributions within teams are shown in Fig. 9. We found inequality between the top and bottom contributors within teams.
The 20/20 ratio for M1 had commits between 3.05 and 19, merges between 2 and N A, and churn between 5.93 and 181.89. High inequality was exhibited across all contribution types except the merges for team 4, with a moderate inequality. Team 4 had the lowest inequality for all of the contributions. The Palma ratio had commits ranging between 1.45 and 2.38, merges between 0.86 and N A, and churn between 2.40 and 10.43. Teams 1 and 2 had high inequality for commits, while teams 3 and 4 had moderate inequality. The merges were high for team 2 and 3, moderate for team 1, and low for team 4. The churn had high inequality for all the teams. The 50/50 ratio ranged between 2.15 and 2.6 commits, 1.48 and N A merges, and 2.26 and 8.01 churn. The commits had moderate inequality for all the teams. The merges had low inequality, for teams 1 and 4, and high inequality, for teams 2 and 3. The churn had high inequality, except for team 3 with moderate inequality.
For M2, the 20/20 ratio's commits ranged between 1.79 and 35, merges between 5 and N A, and churn between 3.45 and 207.22. All teams had high inequality, except team 3 in commits with low inequality. The Palma ratio ranges between 0.72 and 5 for commits, 1 and 5 for merges, and 1.22 and 75.26 for churn. Team 3 had a low inequality for commits and merges with a moderate inequality for churn. Team 1's commit inequality was moderate. The rest of the ratios exhibit high inequality. The 50/50 ratio ranges between 1.32 and 5.86 commits, 1.88 and 6 merges, and 1.87 and 75.9 churn. Team 3's contributions had low inequality, the commits for team 1 and the merges for team 2 had moderate inequality, and the rest of the contributions had high inequality.
In M3, the 20/20 ratio's commits ranged between 2.35 and 4.29, merges between 2.7 and 6.67, and churn between 1.46 and 3.01. The commits had high inequality except for team 3, the merges had high inequality except for team 4, and the churn had high inequality for team 2, moderate inequality for team 1 and 4, and low inequality for team 3. The Palma ratio had commits ranging between 1.04 and 1.61, merges between 1.12 and 1.54, and churn between 0.72 and 1.43. The inequality of the contributions for the teams of M3 was moderate, except for the churn of team 3 that had low inequality. The 50/50 ratio's commits ranged between 1.45 and 2.54, merges ranged between 1.59 and 2.59, and churn ranged between 1.23 and 2.38. Team 2 of M3 had moderate inequality for the ratio, while all the other teams had low inequality across all contributions.
The 20/20 ratio for SE1 had commits ranged between 3.78 and 5.36, merges ranged between 1.73 and 4, and churn ranged between 6.44 and 197.69. The inequality was high for most contributions except in merges, where teams 3 and 4 had moderate inequality, and teams 2 and 5 had low inequality. The Palma ratio for SE1 had commits ranging between 1.22 and 2.52, merges between 0.86 and 2, and churn between 1.91 and 18.31. For commits, the teams had moderate inequality except for team 1. The merges had moderate inequality except for teams 2 and 5 with low inequality. The churn had high inequality except for team 5 with moderate inequality. The 50/50 ratio had commits ranging between 1.63 and 2.35, merges between 1.36 and 2, and churn between 2.08 and 8.06 for SE1. The commits had moderate inequality in teams 1, 3 and 5, and low inequality in teams 2 and 4. Merges had low inequality. The churn had high inequality except for team 5 with moderate inequality.
For SE2, the 20/20 ratio commits ranged between 1.81 and 4.19, merges between 2.15 and 5.57, and churn between 5.21 and 21.95. The commits' inequality was low for team 2 and high for the rest. The merges' inequality was moderate for teams 2 and 4 and high for teams 1 and 3. The churn's inequality was high for all teams. The Palma ratio ranges between 0.89 and 1.79 commits, 1.04 and 1.7 merges, and 1.21 and 10.53 churn. The commits had moderate inequality except for team 2, with low inequality. The merges had moderate inequality for all the teams. The churn was moderate for teams 2 and 4, and high for teams 1 and 3. The 50/50 ratio ranges between 1.38 and 2.88 commits, 1.54 and 2.28 merges, and 1.64 and 12.45 churn. The commits and merges for teams 1 and 3 were moderate, while for teams 2 and 4 it was low. The churn inequality for teams 1 and 3 was high, team 2 was moderate and team 4 was low.
In SE3, the 20/20 ratio's commits ranged between 1.69 and 3.59, merges between 1.96 and 7.67, and churn between 8.15 and 39.29. The inequality for commits and merges for team 4 was low, for team 1 was moderate, and was high for teams 2 and 3. The churn had high inequality for all the teams. The commits ranged between 0.54 and 1, merges between 0.71 and 1.73, and churn between 2.44 and 10.04 for the Palma ratio. The commits had low inequality for all teams. The merges had low inequality for all teams except team 2 with moderate inequality. Churn's inequality was high across all teams. The 50/50 ratio ranges between 1.46 and 2.2 commits, 1.56 and 2.75 merges, and 5.03 and 12.67 churn. Teams 1 and 4 had low  inequality for commits and merges. Team 2 had low inequality for commits, and moderate inequality for merges. Team 3 had moderate inequality for commits and merges. The churn inequality was high across all teams.
The 20/20 ratio for SE4 ranges between 2.48 and 3.42 commits, 1.5 and 2.95 merges, and 2.7 and 213.36 churn. Teams 1, 3 and 4 had moderate inequality for commits, while team 2 had high commit inequality. The merges had low inequality except for team 4 with moderate inequality. The churn had high inequality except for team 1 with moderate inequality. The Palma's ratio commits ranged between 0.67 and 1.1, merges ranged between 0.59 and 1.03, and churn ranged between 1.18 and 119.43. The inequality for commits was moderate, except for team 3 with low inequality, for merges was low except for team 4 with moderate inequality, and for churn was high except for team 1 with moderate inequality. The commits ranged between 1.69 and 2.07, merges between 1.27 and 2.44, and churn between 1.67 and 120.89 for the 50/50 ratio. Teams 1, 2, and 3 had low inequality for commits and merges, while team 4 had moderate inequality. Churn's inequality was high, except for team 1 with low inequality.
In SE5, the 20/20 ratio ranges between 1.84 and 4.23 commits, 1.98 and 3 merges, and 9.01 and 53.52 churn. The inequality for commits and churn for team 1 was low, for team 3 was moderate, and for team 2 it was high and moderate, respectively. The churn had high inequality for all the teams. The commits ranged between 0.73 and 1.9, merges between 0.77 and 1.38, and churn between 2.22 and 24.2 for the Palma ratio. Team 1 had low inequality for commits and merges, while teams 2 and 3 had moderate inequality. The churn exhibits high inequality across all teams. The 50/50 ratio ranges between 1.54 and 2.3 for commits, 1.51 and 1.89 for merges, and 2.97 and 17.22 for churn. The commits had low inequality for all teams except team 2 with moderate inequality. The merges exhibit low inequality for all teams. The churn had high inequality for the teams, except for team 2 with moderate inequality.

RQ2: Contribution evolution over time
To answer this research question, we analyzed the frequencies of commits and merges over time.

Student contribution over time
A heat map of the contribution frequency of students to projects over time is shown in Fig. 10. The horizontal axis depicts time (tags on sprint milestones), and the vertical axis shows contributions (commits and merges) per project. According to the legend on the right, frequency is represented by a color: high frequencies are dark blue and low frequencies are light green. Data shows that students were contributing at a non-steady pace.
Mobile projects show a slow start, but frequency improves slightly over time. Still, we can see that the pace was not constant, as there were peaks around due dates. This indicates that the core agile principle of constant work was not being followed. Furthermore, contributions increase over time, as evident from hot zones toward the right end of the map. This might be because students get more familiar with the technologies, reducing learning effort, and increasing productivity. It may also be the result of increased feature expectations from instructors, and students investing more time in the project to meet those expectations. M3 had the most contributions per day, followed by M1 with M2 with the least amount of contributions. SE projects behave similarly to mobile projects, as they start with a low contribution frequency but this frequency intensifies over time. Most contributions can be found around deadlines, though for SE2, SE3, SE4, and SE5 the peaks were reduced in the last sprints. SE projects had higher contributions per day than mobile projects. In the first sprint, merges happened very close to the deadline leading to difficult and time-consuming merges. However, after the first sprint, students improved their continuous integration skills to ease these problems and reduce integration defects.

Team contribution over time
A heat map of the contribution frequency of teams to projects over time is shown in Fig. 11. We found that the frequency pace varies between teams.
For SE projects, the frequency of contributions was not the same for all teams. Some teams commit and merge more frequently, while others contribute mostly around deadlines. The commits' highest frequency was concentrated in a few teams. Nonetheless, teams do improve their contribution frequency throughout the project.
Regarding mobile projects, we observed that most of the M1 and M2 teams do not contribute steadily. There were teams with significant gaps in contributions, like team 1 in M2. Yet some teams contribute more evenly, like team 3 in M2. This could indicate that not all teams demonstrate agile principles of constant work. This behavior could be explained by the fact that after each sprint, students devote their time to complete assignments for other courses.

Student contribution within a team over time
Heat maps of the contribution frequency of students within teams over time are shown in Fig. 12. Data shows that there were variations in contribution frequency among students within a team.
In M1, it is noticeable that students in teams 3 and 4, in general, had more contributions. Team 3's students had the best frequency of commits, not showing wide gaps. Most merges effectuated by students were sporadic and mostly concentrated around deadlines. M2 also had very few and infrequent contributions. Merges, in general, were non-existent and concentrated around deadlines. Students of teams 1 and 2 committed almost exclusively around deadlines, while students of team 3 show a much steadier pace. M3 contrasts with the other mobile projects as there were more contributions and a steadier pace. Still, not all students had a steady pace, with some only contributing near deadlines. This phenomenon also occurs in teams where one student has a more frequent pace while another contributes near deadlines, like in team 4.
For SE projects, in SE1 we can notice that there were students that had a very steady pace almost always committing, while most students' frequency was not as steady. Merges seem to be performed around the same dates for all students in a team. SE2 had a steadier commit pace for most students in teams 1, 2, and 4. Still, within these teams, some team members do not follow this pattern like in team 4. Merges for students in team 1 seem to be more frequent than in the other teams. For SE3, teams 1 and 2 had fewer commit peaks than teams 3 and 4. Team 4 had a steadier merge frequency for all of the students than other teams. In SE4, team 3 had for most students a steady frequency, while in teams 1 and 4 the frequency between students varied more. Team 2 had a more sporadic frequency than the other teams. Finally, for SE5 the frequency and amount of contributions varied greatly between teams, with a similar pattern in the teams. Team 3 had the steadiest pace, followed by team 2, and lastly team 3. Still, for merges in team 1 some team members also do not seem to contribute at such a steady pace.

Discussion
The analysis of code contributions using git metrics allowed us to identify contributions from different perspectives. The churn metric was useful to determine the quantity of the contributions, while the commits and merges were helpful to evaluate the contribution process. Further research should be conducted to explore other attributes of contributions such as quality, complexity, difficulty, and value, providing more information about the contribution to a project.
The inequality in the contributions helps to determine how contributions are distributed among developers and teams. We found that all our projects had inequalities, hence, inequality metrics were useful to determine how unequal students contributions were within a project, team contributions were within a project, and students contributions were within their team. Additionally, it is important to recognize high achievers to reward them and low achievers to help them improve.
Analyzing the contribution to modules allows to identify cross-functional contributions by students. It is important for teachers to find out whether students are participating in all of the project functionality and gaining cross-functional skills needed in their careers. Tracking this might motivate students to make contributions to all of the modules. Further research of what students are contributing to and the ownership of these artifacts can be conducted.
The contribution frequency enables the discovery of patterns regarding the pace of contributions. This helped in identifying if student's pace was constant or determining which teams or students need to improve their pace. Also, the time of merge helped find out whether continuous integration practices were followed. Additional research could analyze the evolution and impact of other types of contribution aspects such as churn and quality.
The proposed approach can be used in any software development project that uses a Git-based repository, and is interested in gathering process and product data. Furthermore, metrics such as commits, merges, churn and inequality can be automatically calculated for any software development project with the goal of gaining an objective understanding of developers' contributions or determining how software engineering practices were followed. In the academic context instructors can benefit from a tool like this that aids in grading and provides data to identify improvement opportunities for courses. Likewise, students may benefit from getting objective, timely feedback on their performance, which helps them to improve their development skills. Moreover, this approach with the visualization of these metrics can be used in real-time sessions with students, in order to help them improve their software practices, such as in continuous integration and contribution pace.
With regards to how generalizable are the results, we found that students tend to work near deadlines and there is considerable inequality in the projects. This means that instructors must consider the fact that students do not work at a constant pace and consider grade assignment nuances in contribution dynamics where students do not participate in equal measure. Hence, information gathered from projects can be  used for improvement opportunities in courses. These results may be generalized to projects with similar characteristics; however, they would be influenced by aspects like students' motivation towards the project, group dynamic, year of study, previous knowledge, and specific characteristics of the developed project.
We acknowledge that our approach should be assessed in a complementary manner by gathering students' perceptions, including the benefits or limitations, of using these metrics. Their perceptions would be useful to determine if tools like ours really help improve the process, product, or skills. This could be part of a future analysis where other metrics such as quality of the contributions are investigated [25] and their impact in courses analyzed.

Conclusions
Measuring student and team contributions in project-based courses is a challenge. Therefore, instructors would greatly benefit from using metrics that help them objectively quantify students' and teams' contributions. In this work, we analyzed the contributions in eight projects with multiple collaborative teams, using two aggregate inequality metrics (inter-decile ratios and inequality indexes) and five git metrics (commits, merges, churn, commits per day and merges per day). Contributions were analyzed for each student and team in the project and students within their teams. We used aggregate measures such as inequality indexes and inter-decile ratios to determine the inequality of contributions in projects. We found considerable inequality in students' contributions with a focus on contributing to functionality. Furthermore, based on the daily commits and merges, we found that students do not have a steady pace, and improvements in contribution frequency are not seen across all students.
Our findings suggest that the distribution of contributions provides critical information for assessing the learning process. It helps in assessing student's contribution, and how well the methodologies are applied in the projects. Hence, the use of such metrics may benefit instructors and students alike. Furthermore, applying these metrics in a professional setting may be useful to detect risky projects that rely on few developers, reward developers for their contributions, or assign developers to tasks where they are more productive.
In future work, we plan to examine the complexity, quality, and value of the contributions. We believe such qualitative factors can complement an assessment based on quantitative metrics. Furthermore, we plan to analyze software ownership to determine trends in the components, modules, and files. Finally, we plan on gathering instructors' and students' perceptions to determine the benefits and limitations of using these metrics.