DevSource

This is a datasource that tracks public activities on GitHub and provides reports which are made up of dimensions and metrics. It helps us to analyze developers' behaviour and explore trending technologies, growing organizations or popular languages.

The dataset consists of public actions made by an actor (user, organization or bot) in a repository which may belong to an organization.

Queries

Data can be fetched by queries with specific dimensions and metrics.

query

{
  "hour_from" : "2020-04-05-05",
  "hour_to" : "2020-04-08-04",
  "dimensions" : [ "repository" ],
  "metrics" : [ "stars" ],
  "limit" : 1
}

results

[ {
  "repository" : "rclone/rclone",
  "stars" : 914
} ]

The results in this document are the actual rows that you get if you run the queries. In the above query we specified the dimension repository and the metric stars to get the top project in which the action star has been performed the most. The query can contain multiple dimensions and metrics! Let's see which languages are used and the respective organizations (if any) in terms of how many forks and pull request merges have been made.

query

{
  "hour_from" : "2020-04-05-05",
  "hour_to" : "2020-04-08-04",
  "dimensions" : [ "organization", "repo_language" ],
  "metrics" : [ "forks", "pull_requests_merged" ],
  "limit" : 10
}

results

[ {
  "organization" : "None",
  "repo_language" : "n/a",
  "forks" : 64260,
  "pull_requests_merged" : 0
}, {
  "organization" : "None",
  "repo_language" : "None",
  "forks" : 6540,
  "pull_requests_merged" : 11178
}, {
  "organization" : "None",
  "repo_language" : "JavaScript",
  "forks" : 5123,
  "pull_requests_merged" : 32521
}, {
  "organization" : "None",
  "repo_language" : "Python",
  "forks" : 4817,
  "pull_requests_merged" : 10119
}, {
  "organization" : "learn-co-students",
  "repo_language" : "Ruby",
  "forks" : 4446,
  "pull_requests_merged" : 1
}, {
  "organization" : "None",
  "repo_language" : "Java",
  "forks" : 3218,
  "pull_requests_merged" : 8340
}, {
  "organization" : "None",
  "repo_language" : "Jupyter Notebook",
  "forks" : 2082,
  "pull_requests_merged" : 1548
}, {
  "organization" : "None",
  "repo_language" : "HTML",
  "forks" : 2042,
  "pull_requests_merged" : 9022
}, {
  "organization" : "learn-co-students",
  "repo_language" : "n/a",
  "forks" : 1791,
  "pull_requests_merged" : 0
}, {
  "organization" : "None",
  "repo_language" : "C",
  "forks" : 1558,
  "pull_requests_merged" : 3046
} ]

The result rows are sorted by the provided metrics. We can specify up to 4 dimensions and 5 metrics. For more information please check the json schema.

Dimensions

actor_company

actor_country

actor_is_hireable

actor_type

org_country

org_location

organization

repo_is_fork

repo_language

repo_license

repository

Metrics

branch_create_actors

branch_delete_actors

branches_created

branches_deleted

commit_comment_actors

commit_comments

fork_actors

forks

issue_close_actors

issue_comment_actors

issue_comments

issue_open_actors

issue_reopen_actors

issues_closed

issues_opened

issues_reopened

member_add_actors

members_added

pr_review_comment_actors

pr_review_comments

pull_request_close_actors

pull_request_merge_actors

pull_request_open_actors

pull_request_reopen_actors

pull_requests_closed

pull_requests_merged

pull_requests_opened

pull_requests_reopened

push_actors

pushes

release_actors

releases

repo_open_sourced_actors

repos_open_sourced

repositories_created

repository_create_actors

star_actors

stars

tag_create_actors

tag_delete_actors

tags_created

tags_deleted

wiki_page_save_actors

wiki_pages_saved

The suffix _actor indicates a metric that refers to the number of distinct actors that have performed an action.

Required Parameters

hour_from

hour_to

dimensions

metrics

The hour_from and hour_to parameters are strings in the yyyy-MM-dd-HH format that restrict the time range in which the actions occurred. The dataset updates hourly but historical data are retained for 3 days.

Optional Parameters

The limit parameter restricts the number of rows in the result set. It is an optional parameter with a maximum value of 1000.

If we want to retrieve rows of specific dimension values, we can use filters.

query

{
  "hour_from" : "2020-04-05-05",
  "hour_to" : "2020-04-08-04",
  "dimensions" : [ "repository" ],
  "metrics" : [ "star_actors" ],
  "filters" : {
    "org_country" : [ "US", "GB" ],
    "repo_license" : [ "Apache-2.0" ]
  },
  "limit" : 5
}

results

[ {
  "repository" : "microsoft/TypeScript",
  "star_actors" : 88
}, {
  "repository" : "facebookresearch/detectron2",
  "star_actors" : 75
}, {
  "repository" : "d2l-ai/d2l-zh",
  "star_actors" : 66
}, {
  "repository" : "google-research/text-to-text-transfer-transformer",
  "star_actors" : 64
}, {
  "repository" : "minio/minio",
  "star_actors" : 64
} ]

These are the top 5 repositories that belong to organizations located in the US or GB and have an Apache-2.0 license. Filter values are case sensitive.

Request

You can run a query by making an http POST request. Include an Authorization and Content-Type header and provide the query as the body.

curl -X POST
  https://dev-source.herokuapp.com/
  -H 'Authorization: token XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX'
  -H 'Content-Type: application/json'
  -d '{
    "hour_from": "2020-04-05-05",
    "hour_to": "2020-04-08-04",
    "dimensions": [
        "repository"
    ],
    "metrics": [
        "star_actors"
    ],
    "filters":{"org_country":["US", "GB"],
    	"repo_license":["Apache-2.0"]
    },
    "limit": 10
}'

Replace XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX with your actual private token and you are ready to go. There is a limit of one request per 3 seconds. Try not to exceed it! These examples are designed to illustrate a few use cases.

Many options, like the maximum allowed limit or the maximum number of dimensions and metrics, may be changed in the future. This documentation, though, is generated dynamically. You can be sure that it's always updated!

The service is provided by baresquare.