This is a datasource that tracks public activities on GitHub and provides reports which are made up of dimensions and metrics. It helps us to analyze developers' behaviour and explore trending technologies, growing organizations or popular languages.
The dataset consists of public actions made by an actor (user, organization or bot) in a repository which may belong to an organization.
Data can be fetched by queries with specific dimensions and metrics.
query
{
"hour_from" : "2021-01-14-12",
"hour_to" : "2021-01-17-11",
"dimensions" : [ "repository" ],
"metrics" : [ "stars" ],
"limit" : 1
}
results
[ {
"repository" : "Developer-Y/cs-video-courses",
"stars" : 2116
} ]
The results in this document are the actual rows that you get if you run the queries. In the above query we specified the dimension repository
and the metric stars
to get the top project in which the action star has been performed the most. The query can contain multiple dimensions and metrics! Let's see which languages are used and the respective organizations (if any) in terms of how many forks and pull request merges have been made.
query
{
"hour_from" : "2021-01-14-12",
"hour_to" : "2021-01-17-11",
"dimensions" : [ "organization", "repo_language" ],
"metrics" : [ "forks", "pull_requests_merged" ],
"limit" : 10
}
results
[ {
"organization" : "None",
"repo_language" : "n/a",
"forks" : 53440,
"pull_requests_merged" : 0
}, {
"organization" : "None",
"repo_language" : "JavaScript",
"forks" : 5903,
"pull_requests_merged" : 27867
}, {
"organization" : "None",
"repo_language" : "None",
"forks" : 5084,
"pull_requests_merged" : 12193
}, {
"organization" : "None",
"repo_language" : "Python",
"forks" : 4628,
"pull_requests_merged" : 10990
}, {
"organization" : "learn-co-curriculum",
"repo_language" : "n/a",
"forks" : 2748,
"pull_requests_merged" : 0
}, {
"organization" : "None",
"repo_language" : "Java",
"forks" : 2209,
"pull_requests_merged" : 8699
}, {
"organization" : "None",
"repo_language" : "HTML",
"forks" : 1707,
"pull_requests_merged" : 8428
}, {
"organization" : "None",
"repo_language" : "Jupyter Notebook",
"forks" : 1278,
"pull_requests_merged" : 1459
}, {
"organization" : "None",
"repo_language" : "C++",
"forks" : 1038,
"pull_requests_merged" : 4355
}, {
"organization" : "learn-co-students",
"repo_language" : "Ruby",
"forks" : 1037,
"pull_requests_merged" : 0
} ]
The result rows are sorted by the provided metrics. We can specify up to 4 dimensions and 5 metrics. For more information please check the json schema.
actor_company
actor_country
actor_is_hireable
actor_type
org_country
org_location
organization
repo_is_fork
repo_language
repo_license
repository
branch_create_actors
branch_delete_actors
branches_created
branches_deleted
commit_comment_actors
commit_comments
fork_actors
forks
issue_close_actors
issue_comment_actors
issue_comments
issue_open_actors
issue_reopen_actors
issues_closed
issues_opened
issues_reopened
member_add_actors
members_added
pr_review_comment_actors
pr_review_comments
pull_request_close_actors
pull_request_merge_actors
pull_request_open_actors
pull_request_reopen_actors
pull_requests_closed
pull_requests_merged
pull_requests_opened
pull_requests_reopened
push_actors
pushes
release_actors
releases
repo_open_sourced_actors
repos_open_sourced
repositories_created
repository_create_actors
star_actors
stars
tag_create_actors
tag_delete_actors
tags_created
tags_deleted
wiki_page_save_actors
wiki_pages_saved
The suffix _actor
indicates a metric that refers to the number of distinct actors that have performed an action.
hour_from
hour_to
dimensions
metrics
The hour_from
and hour_to
parameters are strings in the yyyy-MM-dd-HH
format that restrict the time range in which the actions occurred. The dataset updates hourly but historical data are retained for 3 days.
The limit
parameter restricts the number of rows in the result set. It is an optional parameter with a maximum value of 1000.
If we want to retrieve rows of specific dimension values, we can use filters
.
query
{
"hour_from" : "2021-01-14-12",
"hour_to" : "2021-01-17-11",
"dimensions" : [ "repository" ],
"metrics" : [ "star_actors" ],
"filters" : {
"org_country" : [ "US", "GB" ],
"repo_license" : [ "Apache-2.0" ]
},
"limit" : 5
}
results
[ {
"repository" : "microsoft/playwright",
"star_actors" : 98
}, {
"repository" : "microsoft/TypeScript",
"star_actors" : 86
}, {
"repository" : "google-research/google-research",
"star_actors" : 57
}, {
"repository" : "NationalSecurityAgency/ghidra",
"star_actors" : 50
}, {
"repository" : "facebookresearch/deit",
"star_actors" : 47
} ]
These are the top 5 repositories that belong to organizations located in the US or GB and have an Apache-2.0 license. Filter values are case sensitive.
You can run a query by making an http POST request. Include an Authorization
and Content-Type
header and provide the query as the body.
curl -X POST
https://dev-source.herokuapp.com/
-H 'Authorization: token XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX'
-H 'Content-Type: application/json'
-d '{
"hour_from": "2021-01-14-12",
"hour_to": "2021-01-17-11",
"dimensions": [
"repository"
],
"metrics": [
"star_actors"
],
"filters":{"org_country":["US", "GB"],
"repo_license":["Apache-2.0"]
},
"limit": 10
}'
Replace XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
with your actual private token and you are ready to go. There is a limit of one request per 3 seconds. Try not to exceed it! These examples are designed to illustrate a few use cases.
Many options, like the maximum allowed limit
or the maximum number of dimensions
and metrics
, may be changed in the future. This documentation, though, is generated dynamically. You can be sure that it's always updated!
The service is provided by baresquare.