It’s been a month since I have been contributing on rubygems.org. My project is to import endpoints used by bundler to install gems to rubygems.org. Did you know that
Fetching gem metadata from https://rubygems.org/
is kind of a lie? It is actually talking to http://bundler.rubygems.org/
and that is a sinatra app (bundler-api) different from rails app of rubygems.org. Rubygems.org didn’t have the infrastructure to handle all the requests coming from all the Rubiest in the world running bundle install
that is why dependency endpoint used by bundler had to be served from a different app. Now with the help of Ruby Together we have the resources, so we would really like it if we don’t have two applications to maintain. It is so not fun when people can’t find new version releases.
My project got a jump start because #1225 was merged. I only had to make a few changes so that the response matched the one of bundler-api. In the process, I removed some of the things we weren’t using anymore: rubyforgers and version_histories tables and downloads column from rubygems table. While the former two had not been used for a while, the downloads column had gone out of service fairly recently. The way rubygems.org tracks downloads of gems is rather interesting. If you ask me, how I would track number of downloads, I would probably just say that I would keep a field count
for it and increment it every time someone downloaded a gem. Given that bundle-api handles 5-7k requests per minute, that won’t be a good idea. I know the request stats because Nick (@qrush) gave me access to New Relic app of rubygems ❤ Thanks Nick! Data is beautiful, indeed.
What rubygems.org does is that, it updates downloads in bulk. Gems are served over Fastly CDNs, and rubygems.org process the log files generated by them every minute (For details checkout: Simplifying our stack). There is more… We track downloads by version. So, how would you find out all the downloads of a gem? I would probably say that I would sum the downloads of all the versions of a gem with an activerecord query. What rubygems.org does is that, during the bulk update it also updates the version_id: 0
row with:
increment(count, rubygem_id: rubygem_id, version_id: 0)
Now, we don’t have to sum over 709 versions of caboose-cms and it is simple fetch of a row. Following the same pattern, it keeps track of the total count of all the gem downloads with:
increment(total_count, rubygem_id: 0, version_id: 0)
My contribution related to all this was that now you won’t see show all versions link below downloads stats on gem show page.
I guess I haven’t made any significant change so far. David (@dwradcliffe) has suggested that I pick up metadata migration. That has me excited! I will be changing the way people build gems.. at least I will be writing the code that will make it happen. We also have plans of adding features to our search system. We are using elasticsearch on rubygems.org, I am pretty sure it can do much more than basic search we have right now.
Lastly, I would suggest that you should checkout upsert. I got it know about it through Arthur (@arthurnn), SQL seems to be his thing. It was introduced only in pg 9.5 and I think it’s really cool. I can’t wait until we use it on rubygems.org.