Statocaml is a tool to gather development data from a github repository and generate statistics.
It was developed initially to study the development of OCaml but can be used on any github repository.
A presentation (in French) is available here.
Development is hosted here.
opam install statocaml_go statocaml_fetch
or
opam pin add https://gitlab.inria.fr/guesdon/statocaml.git
Statocaml uses some external tools which must be installed too:
You will need a valid Github token to fetch information from the github site.
Create a configuration file conf.json (in JSON format)
looking like this one (here the ocaml/conf.jsonfile for the
ocaml/ocaml Github
repository):
{
"data_dir": "data", // where to store data
"github": {
"token": "...", // put your token here
"cache_dir": "cache", // directory used for caching
"user": "ocaml", // github user the repository belongs to
"repo": "ocaml", // the repository to fetch
"fetch_gh_users": [
"bnigito", "bstarynk", "cagdasbozman", "claudemarche", "enaudon",
"Lucccyo", "clappski", "djs55", "dweil", "flindgren", "gnecula",
"Hirrolot", "jaked", "jeffsco", "jessicah", "jserot", "JuliaLawall",
"kerneis", "khooyp", "klartext", "kyleheadley", "lebotlan", "letouzey",
"MadCoder", "maverickwoo", "mkoconnor", "monate", "mrvn",
"Nick-Chapman", "nickgian", "pdenys", "pocarist", "revskill10",
"rdicosmo", "roshanjames", "sacerdot", "signoles", "smimram", "strub",
"tertium", "TheAspiringHacker", "vog", "zoep"
] // additional users to fetch
}
}The fetch_gh_users field is required when a changelog is
provided and it refers to contributions before the code was imported to
Github: some contributors are not associated to their github accounts;
this field is used to force fetching these accounts, and these accounts
can be referenced in the file indicated by the --gh-users
command line option of statocaml_go tool.
The file will be overriden by the statocaml_fetch tool
to perform incremental fetch, i.e. not download all data every time we
want to update the data locally.
The Github token can be indicated in three ways in the
token field:
as a string value of the field:
{
...
"token": "ghp_Km...",
...
}in an environment variable whose name is given as a string value
of the field, beginning with $:
{
...
"token": "$GITHUB_TOKEN",
...
}on the first line of a file name is given as a string value of
the field, beginning with . (relative filename) or
/ (absolute filename)’:
{
...
"token": "./github.token",
...
}In the directory where your conf.json is, run:
$ statocaml_fetch
Use the -c option to indicate a different configuration
file. This operation can take many hours (or days), depending on the
size of the repository (number of commits, issues, pull requests,
…).
To update the local fetched data, run the same command. This will
fetch new data since the last fetch, according to a field
last_event_date automatically added into the configuration
file.
Run
$ statocaml_go
By default, the configuration file used is conf.json but
another one can be specified with the -c option. Beware
that relative directories in configuration file are applied from current
directory, not the directory where is the configuration file.
statocaml_go can use a changelog formatted in JSON
format (with option --changelog) to get releases and
contributors of each Issue/PR. For OCaml, Octachron’s ocaml-changelog-analyzer
is used to convert the OCaml Changes file to a JSON
file.
Main command line options are:
--html-outdir to indicate the directory where to
generate the web site; default is
statocaml-output.
--events to indicate a file with additional events
to display in some visualizations. The file is in JSON format, of the
form
[
{ "label": "my event 1", "date": (2025, 12, 31)},
{ "label": "my event 2", "date": (2025, 07, 14)},
...
]--gh-users to indicate a JSON file with information
about contributors. This file is used to merge contributors found with
different names of emails. It has the following form:
[
{ "names": ["Alan Turing", "Turing Alan"], "gh_login": "alanturing",
"emails": ["alan.turing@foo.bar", "alan@turing.uk"]
},
...
]--groups to indicate a JSON file containing the
definition of groups. This produces additional statistics for these
groups, by merging statistics of their members. A group may have a
github account indicated. The file has the following form:
{
"Group 1": {
"members": {
"githublogin1": [{"start":"2023-01-01T00:00:00-00:00"}],
"githublogin2": [{"start":"2013-11-01T00:00:00-00:00","stop":"2017-11-01T00:00:00-00:00"}],
"githublogin3":[],
... },
"gh_account": "githubloginofgroup"
},
...
}--gui to launch a Graphical User Interface to create
some plots according to entered parameters; these parameters can be
copied in a JSON file. When this file is given with option
--plots, these additional plots are integrated to the
generated website.
--subsystems to indicate a file containing subsystem
definitions. Subsystems are defined by a name and an id; regular
expressions are used to associate filenames in the git repository to
subsystems. See ocaml/subsystems.json for an example of
definitions used for OCaml.