GitHut – Programming Languages and GitHub

GitHub

GitHut is an effort to visualize and explore the complexity of the universe of programming languages ​​used in repositories hosted on GitHub.

Programming languages ​​are not just tools that developers use to create programs or express algorithms, but also tools to code and decode creativity. By observing the history of languages ​​we can enjoy mankind’s discovery of better ways to solve problems, facilitate cooperation between people, and reuse the effort of others.

GitHub is the world’s largest code host, with 3.4 million users. This is where the open-source development community provides access to most of their projects. By analyzing how languages ​​are used in GitHub it is possible to understand the popularity of programming languages ​​among developers and discover the unique characteristics of each language.

data

GitHub provides publicly available APIs for interaction with its massive dataset of events and hosted repositories.
GitHub Archive takes this data a step further by collecting and storing it for public consumption. The GitHub Archive dataset is also available through Google BigQuery.

The quantitative data used in GitHub is collected from the GitHub Archive. The data is updated on a quarterly basis.

An additional note about the data is about the large amount of records that do not have a programming language specified. This particular feature is extremely explicit for create events (repositories), so it is not possible to imagine trending language in the context of a newly created repository. for this reason action Value (in terms of number of changes) has been considered the best metric for the popularity of programming languages.

The release year of a programming language is based on the timeline table of programming languages ​​from Wikipedia.

Check out GitHub’s publicly available GitHub repository for more information about data collection practices.



Leave a Comment