Rubygems.org and Me

It’s been a month since I have been contributing on rubygems.org. My project is to import endpoints used by bundler to install gems to rubygems.org.  Did you know that

Fetching gem metadata from https://rubygems.org/

is kind of a lie? It is actually talking to http://bundler.rubygems.org/ and that is a sinatra app (bundler-api) different from rails app of rubygems.org. Rubygems.org didn’t have the infrastructure to handle all the requests coming from all the Rubiest in the world running bundle install  that is why  dependency endpoint used by bundler had to be served from a different app. Now with the help of Ruby Together we have the resources, so we would really like it if we don’t have two applications to maintain. It is so not fun when people can’t find new version releases.

My project got a jump start because #1225 was merged. I only had to make a few changes so that the response matched the one of bundler-api. In the process, I removed some of the things we weren’t using anymore: rubyforgers and version_histories tables and downloads column from rubygems table. While the former two had not been used for a while, the downloads column had gone out of service fairly recently. The way rubygems.org tracks downloads of gems is rather interesting. If you ask me, how I would track number of downloads, I would probably just say that I would keep a field count for it and increment it every time someone downloaded a gem. Given that bundle-api handles 5-7k requests per minute, that won’t be a good idea. I know the request stats because Nick (@qrush)  gave me access to New Relic app of rubygems ❤ Thanks Nick! Data is beautiful, indeed.

Screenshot from 2016-05-28 22-59-36

What rubygems.org does is that, it updates downloads in bulk. Gems are served over Fastly CDNs, and rubygems.org process the log files generated by them every minute (For details checkout: Simplifying our stack). There is more… We track downloads by version. So, how would you find out all the downloads of a gem? I would probably say that I would sum the downloads of  all the versions of a gem with an activerecord query. What rubygems.org does is that, during the bulk update it also updates the version_id: 0 row with:

increment(count, rubygem_id: rubygem_id, version_id: 0)

Now, we don’t have to sum over 709 versions of caboose-cms and it is simple fetch of a row. Following the same pattern, it keeps track of the total count of all the gem downloads with:

increment(total_count, rubygem_id: 0, version_id: 0)

My contribution related to all this was that now you won’t see show all versions link below downloads stats on gem show page.

Screenshot from 2016-05-28 22-00-45

I guess I haven’t made any significant change so far. David (@dwradcliffe) has suggested that I pick up metadata migration. That has me excited! I will be changing the way people build gems.. at least I will be writing the code that will make it happen. We also have plans of adding features to our search system. We are using elasticsearch on rubygems.org, I am pretty sure it can do much more than basic search we have right now.

Lastly, I would suggest that you should checkout upsert. I got it know about it through Arthur (@arthurnn), SQL seems to be his thing. It was introduced only in pg 9.5 and I think it’s really cool. I can’t wait until we use it on rubygems.org.

Rubygems.org and Me

Running Tests Should be Fast!

Let me name you some of the worst things in world: Final season of Two and a Half Men, stubbing your toe at night, text on meme which is too small to read and test suites where you have to wait for 5+ seconds before any of your tests run.

PTEqyHK

On OSEM, I wanted to improve the test coverage, which is Coverage Status. Red is not really my favorite color (unless it is in a git diff).  If you don’t see red anymore, you know what happened 😉

It was really slow to run tests. I introduced spring, and it did save me file load time but it wasn’t good enough. The aha moment was when I got:

The following factories are invalid: (FactoryGirl::InvalidFactoryError)event_commercial – Validation failed: Network is unreachable – connect(2) for “www.youtube.com” port 80 (ActiveRecord::RecordInvalid)

I have gotten that error for a total of one time. Which kind of makes me worry about how less often I stay offline.

You would think we must have introduced webmock and moved on. But wait! There is more. I also found out that we can just decouple running FactoryGirl lint from the test suite. Until now, the lint was running before test suite:

RSpec.configure do |config|
config.before(:suite) do
FactoryGirl.lint
end
end

view raw
factory_lint.rb
hosted with ❤ by GitHub

As the Readme of FactoryGirl suggested, I moved lint to rake task and added it to travis. In my head, it made lot of sense. For me, running FactoryGirl lint before every suite is like running rubocop before every request with localhost:3000. As issue #772 on FactoryGirl, very rightly points out.. what if I am doing TDD?  No! TDD is not dead. I am not saying, I use TDD, still running complete lint just when I am running single test is an overhead I can’t bear. May be it just amounts for a couple hundred milliseconds, but those are the milliseconds I shouldn’t have to lose.

Some of the valid arguments against my opinion was that we will potentially be writing tests on broken factories and now we will have to run lint manually. It would indeed suck to write 200 lines  of tests only to find out that your factories are not correct. I think it can be solved to an extent by integrating lint with pre-commit hook.

In the end we were able to find a common ground by moving FactoryGirl lint behind an ENV variable:

RSpec.configure do |config|
config.before(:suite) do
FactoryGirl.lint if CONFIG['factory_girl_lint']
end
end

view raw
factory_lint.rb
hosted with ❤ by GitHub

What would be value of your ENV[OSEM_FACTORY_LINT]?

Running Tests Should be Fast!

Why use GitHub?

A few days back, I had the honor to introduce someone to GitHub. What has me wondering is that I didn’t have to do it for a newcomer rather I explained what is GitHub to a someone who has been in software development field for quite a while.

Are there real people out there who haven’t heard the word “GitHub” before? Following is what I replied to the person of interest:

Hi,

“Please also belabor for me in as many words as your will prefers, why it will be worth my time to find more about GitHub.”

This should be interesting. The basic would be that it is remote git hosting service. Each repository is given a project space. Each project has a bug tracker associated with it, a space for pull requests (git patch), milestones, tasks, changelog etc. Since public projects are free to host on GitHub, it has become de-facto hosting platform for open source projects.
Why should you care? Every organization needs a tool for all its developers to collaborate. GitHub can be that tool for you. On GitHub you can create an organization space and have your developers be part of it. You can define permission levels within the organization . Projects can have permission levels as well. So basically if your organization doesn’t want to deal with hassle of maintaining a git server and a collaboration tool on top of it then you can just let GitHub manage it all. BTW, if you are one of those miserable souls who have to use SVN, GitHub supports that as well.

 

privateinvestocat
Illustration by jeejkang
Okay, enough of mundane talk. Let me introduce you to interesting stuff. GitHub is place we developers hang out. You can follow others and see what they are working on, that is really cool cause everyone is on GitHub. You can follow the developers you worship and see what they have worked on and if they inspire you enough, you can work along side them (cause open source!!).  It is also the place for the happening things. Think of any new technology which has come in last 5 years, it is most likely that is being developed on GitHub. Angular.Js, Jquery, Bootstrap, Go, RustRails, Reacthomebrew, node and everything else, all of it is on GitHub.
Moving on to libraries and frameworks!! All the awesome code others have already written, repositories we love ❤ Have you really never dealt with a bug in library you wanted to use? Using GitHub, you can ask the maintainers of that library to fix the bug for you. In fact they will appreciate that you took the time to report it. redis, emscripten from C, httpie, thefuck from Python, elasticsearch, RxJava from Java, tensorflow, mongo from C++, jekyll and devise from Ruby and once again list goes on. All of them are developed on GitHub.
Cheers,
Aditya

 

I know GitHub is much more than what I wrote. I missed the platform it provides for publishing your awesome hack: HomeMirror, building courses for everyone: FreeCodeCamp, collection of free books: free-programming-books,  a command-line murder mystery : clmystery, all German federal laws and regulations (seriously?): gesetze. I really don’t think I will ever be able to describe this wonderland and not miss something really cool. If there are more of you out there, I hope you will find it useful and you will take GitHub for a spin. We love GitHub, I hope you will too. I really can’t it say it any better than what James said in his Dear GitHub post:

 

“Dear GitHub,

You have done so much to grow the open source community and make it really accessible to users. Somehow you have us chasing stars and filling up squares, improving the world’s software in the process.”

Why use GitHub?

Bug which depended on other bug in a different library

EDIT: I had only figured out part of the problem. You can read more detailed explanation here.

I was finally able to solve bundler error on Spring. Yay me!

It all started when bundler began cleaning ENV[“RUBYLIB”] and the change was released with bundler 1.11.0. Spring failed to `require bundler/setup` cause it was depending on bundler leaving that RUBYLIB path uncleaned. That require wasn’t suppose to fall back to RUBYLIB path in the first place, it was suppose to find it in GEM_PATH. However, GEM_PATH was empty string when one chooses to change the default bundle install path for their app. I don’t know yet if that a desired behavior, I guess I will look into it.

I am glad I took the time to figure it out. Now, I have much better understanding of how the ENV and $LOAD_PATHS work. Hey, did you know that ENV.delete(“key”) returns the value of deleted key? Also, the subtle difference between `dup` and `clone`? This example from ruby doc sums it well:

class Klass
  attr_accessor :str
end

module Foo
  def foo; 'foo'; end
end

s1 = Klass.new #=> #<Klass:0x401b3a38>
s1.extend(Foo) #=> #<Klass:0x401b3a38>
s1.foo #=> "foo"

s2 = s1.clone #=> #<Klass:0x401b3a38>
s2.foo #=> "foo"

s3 = s1.dup #=> #<Klass:0x401b3a38>
s3.foo #=> NoMethodError: undefined method `foo' for #<Klass:0x401b3a38>

You should check out RailsConf 2015 — Breaking Down the Barrier: Demystifying Contributing to Rails from Eileen Uchitelle. She explains use of `caller` and `tracepoint`. I found them really useful while understanding the flow of control in Spring. She also demonstrates use of `git bisect`. Turns out, it is not as intimidating as I though it would be.

Bug which depended on other bug in a different library

I contributed on rails

Not really? Well, I contributed on rails/spring. It is gem which rails app use to preload its files so that every time you run rails console or tests, you don’t have to wait until all the files are loaded. I have never worked with threads before, but again everything has a first. In fact, it is also the first  gem repository whose code I have read end to end. I have contributed on other gems but I just read the class or module I was working on.

Michael Grosser (@grosser) was really helpful and prompt with review of my PRs. Both of my PRs has gone through a lot of scrutiny, still I am happy that atleast one of them is merged. While reading the gem, I realized that rails had kept my exposure limited. In rails one doesn’t have to deal with attr_accesible, require, dependencies etc. You can be as sloppy as want and still everything will work.
Even after all the scrutiny, my PR managed to break the gem functionality. Have you ever used Bundler.setup? $LOAD_PATH? A few more things which you would probably never use on a normal rails app. Apparently `require bundler/setup` is extensively used in gems. It checks your gemfile and overrides the $LOAD_PATH with things it found there.
This is why I feel overwhelmed all the time. I am always afraid that I will break something, that I don’t know what I am doing. I start feeling that Michael and Jon must be think that I am idiot and I am wasting their time. Time they spent reviewing my PR, they could probably made the change themselves. Least I can do is be thankful to them for putting up with me.

When people say that you should read others code, they are right. You get to learn about different coding styles and many other cool things. For example, I found out that ActiveSupport has this `strip_heredoc` method which I could have used when I was testing markdown on glittergallery. I had just written a lot of plus (rubocop, max-LineLength: 80) and a lot of `\n`.

File.write(path_to_file.rb, <<-RUBY.strip_heredoc)
    class Foo
        def self.omg
            raise "omg"
        end
    end
RUBY

Above code will write content between <<-RUBY… RUBY in file you mentioned in well-formatted manner.

 

I contributed on rails

GlitterGallery now running on VPS

Project: GlitterGallery

We recently completed our work on implementation of push over ssh protocol. Do check out our demo: glittergallery-dev and report any issues you come across. You can find implementation details in PR #303. Now users can add ssh keys in their profiles and save themselves hassle of entering credentials every time they push. It also means that now we support sparkleshare. Following are the steps you need to follow:

  • Go to settings -> click ssh key tab -> add a name of your key and paste your sparkleshare key -> click on add key
  • Make a new project
  • Open sparkleshare and go to add hosted projects. In address type: ssh://git@glittergallery-dev.fedorainfracloud.org and in remote path: /<username>/<project_name>.git
  • Click add and wait. Your repo will be synced in a moment.

#293 Nginx & Unicorn from Ryan was a great resource on setting up the VPS.

Right now, we are working on adding diff styles of design-with-git. We might not be using the code of design-with-git, just the ideas. We are focusing on opacity, mask, toggle and side view. I will keep you guys posted.

GlitterGallery now running on VPS

Git over ssh

Project: GlitterGallery

We had a minor set back with implementation of git protocols. I worked on git https protocol but later I found out that sparkleshare only supports ssh protocol. Until now we were planning to host on openshift. I needed access of ~/.ssh/authorized_keys file for git ssh to work but OS doesn’t give away that access. Time to move to VPS. Kevin got me set up with one and Pingou helped me figure out a few details.

First I needed to make changes to our web interface so that users can add their public key to their profiles. This would also mean addition of a keys model and generation of fingerprint for keys. Next thing is validation of keys when push or pull is made over ssh. This involves two steps namely, authentication and authorization. OpenSSH server handles the authentication part and for authorization I have set up git shell, which makes an api call to glittergallery to check user access. Besides authorization git shell also limits ssh access to git related commands.

Git shell I am using is just a fork of gitlab-shell. I am hoping that I won’t need to make any changes to it, however we won’t be supporting all the features (git-annex and git-lfs) of gitlab-shell yet.

Git over ssh

Awesomeness of Rack

Project: GlitterGallery

I got away with setting up with authentication without having to set an API for our app. Rack::Auth::Basic made it dead easy to ask for credentials and verify users. Since we are doing everything over smart http protocol of git, user can verify themselves just with their username and password. I think I still have to add the part where user can use either his username or email, but that shouldn’t be too difficult.

We support addition of files from our web UI as well, so it was bit tricky to sync the bare repository and the non bare one (we use later for web UI intraction). A lot of things goes on after user does a push. First the obvious, we update the bare repository, next we do a fetch from non-bare one and merge the new changes (how to link), then we go through each commit and generate thumbnail for them and finally if image for inspiration page has not been generated yet then we create that too.

Now there is just one thing left to do, integration of sparkleshare.

Awesomeness of Rack

The Rabbit Trail: Hosting Gitserver with Rails on OpenShift

We people of GlitterGallery have been trying to figure out how we can add local repo support #161. Until now if a user wanted to put his awesome work for everyone to see and admire, he would have to use our web interface. Which can get really cumbersome, really fast.

Unfortunately there isn’t much documentation about how one should go about  setting up gitserver with rails app. We will be running on openshift so no poking around with apache config files either. I got started with reading these two chapters: Git Internals – Plumbing and Porcelain and Git on the Server of Pro Git Book (If only all good things were free). Just when I was trying to figure out how on earth I am going to implement git-http-backend without having access to server, nice guy Marek pointed me to Grack. So next up was learning more about rack, and I got to do it while hearing the pleasant voice of Ryan Bates: #151 Rack Middleware#222 Rack in Rails 3 and #317 Rack App from Scratch!

We have our push and pull working now, however we need to work on authentication. I am thinking Rack::Auth::Basic and grape.

The Rabbit Trail: Hosting Gitserver with Rails on OpenShift

Good People of Ruby-SIG

In the last fedora-infra meeting, Kushal pointed out that it is important that the gems I am using on my project are packaged in Fedora. I was taken aback by this, because I had no idea that such a thing as rpm package of ruby gem existed. Others were quick to guide me to resources to get me started: Infrastructure/AppBestPracticesPackaging:Guidelines and Packaging:Ruby

I decided to contact the people who must have been through all of this – Ruby-SIG. You can imagine that the ruby community on Fedora must be small, we are all python people here. However, the response I got was just overwhelming. Each one of them so detailed and resourceful in their reply. Ken Dreyer made me a complete road map which I should follow if I plan on packaging ruby gems. He is the maintainer of quite a few ruby gems. Then there was Dominic Cleal, who explained how he dealt with this issue on his project foreman.

The reason why GlitterGalley would benefit from packaging is the distribution of the application. People could easily install it and run it (by starting SystemD unit for instance). The other one is that those people would get security updates for their application automatically with system updates (yum update) which they have to run anyway… so its a lot of work for the maintainer and least effort for people running it. With packages you can also easily ship SELinux policies, properly state all system dependencies etc.

That would be Josef Stribny, who explained why should I care that the gem I am using comes as rpm package.

Obvious problem with all this is that packaging takes a lot of time and it will end up delaying the deployment of GG in production. I hope we will find a middle ground.

Coming to things I worked in the last one week: I added responsive images for the desktop and the mobile site. RMagic’s resize_to_fill came really handy in generating different sizes of images. I also finished my work on authorization. We change our landing page to exploration page, so now our site is quite engaging even for the guest users. I also added sourcemap for sass files, otherwise it was really difficult to work with a single 1k lines long stylesheet file.

Good People of Ruby-SIG