If you want to manually extract data from a Git repository this is the tool you have to use.
It has two parts:
repo_info_extactor (core) and
It is the same implementation we use to automatically extract data from GitHub or GitLab repositories. It gets commit related data from one Git repository for a given author and saves it to a JSON file. If you want to try it see the installation guide here.
If you have multiple repositories it is not very convenient to use the
repo_info_extractor. This is why we created the
It is a wrapper around the
repo_info_extractor. If you want to use it first you have to set up the
How does it work?
- It gets the list of repositories from the given source. It can be used with GitHub, GitLab, BitBucket but easy to extend with any repository hosting service
- Downloads the repositories
- Uses the
repo_info_extratorto extract data from the repositories
- Uploads the result to CodersRank
What data is extracted exactly:
- Remote URL (hashed)
- Email, the username of the commit author (hashed)
- Commit hash
- Programming language of the changed file
- Lines added and removed
- Detected libraries
- filename (hashed)
- Date and time of the commit