Sourcegraph
This is a post about my favourite engineering productivity tool and how to set it up. I feel bad writing about it, because it feels like stealing. I cannot be alone using this tool in the way I will describe, but it is so great - I just cannot keep it quiet. I hope I don’t get a cease & desist, because, honestly, more engineers with large codebases should know about it and use it.
The tool is Sourcegraph and my previous company could not purchase a licence for the whole team, I have been using it as a single user docker container, running it on my laptop, to search through hundreds of internal repositories. It has been a massive productivity boost. Before explaining the setup I would like to say a few words about the product and the quiet impact the company’s culture has made on me, even though I have never worked there. You can skip to the instructions if you want.
Product
Sourcegraph is a code intelligence platform with many great features, but at its
core - it is a code search engine that is criminally easy to try out. For public
and open source repositories - you can just run a query on the public
Sourcegraph instance. For example, this
one
finds instances of max_length
parameter being passed in models.py
file in a
specific repo. I won’t get into details, but structural
search is
magnitudes better than a grep or regex search (both of which Sourcegraph
supports too).
Sourcegraph also provides a quickstart docker single container to try it out in a single user mode with time limited full feature set or just code search. As it happens, either I was savvy enough or no one is talking about it, but I ran it on my laptop and with a little bit extra effort I plugged in the browser and IDE extensions, as well as the command line tool.
What’s so cool about it? Here’s a couple of experiences from last year.
6pm weekday evening, someone sends a desperate message on Slack - a part of the website returns 500, can someone help? I know nothing about that part of the codebase, but generally I know how to debug stuff. Plus, I recently indexed all of the repositories on my local Sourcegraph container - an opportunity to test drive it. I jump on a huddle and it takes us less than a minute to find the exception string in the logs. I paste it into my local Sourcegraph - voila! Two places that raise the exception. Two more clicks through references later and I narrowed down to one of them as most likely being the culprit. From that result I could navigate through the history of the file, who changed it, and also walk through the references of the methods in other files. At that point someone familiar with the codebase joined the call, I told about my findings, and they took it from there and fixed it. But the experience stuck with me: something was broken, I came in with zero knowledge of that part of the codebase, and it took me about a minute to identify a lead for a fix.
In another instance, we were planning to migrate a dependency from one client library implementation to another. One of the first things was to identify all the instances the old client methods have been called across a few hundred repositories. I wrote a structural query, used the command line tool and in five minutes had a list of the repositories, files, and line numbers in them that we would have to change, and by proxy - could also tell which teams it would impact.
Beyond code search, Sourcegraph also has an automated change management feature. You write a small configuration file, describe a change to make & repositories to apply it to, and Sourcegraph clones the repositories, applies the change, creates pull requests, and tracks its progress in a small neat dashboard. I am a fan of automated refactoring, and while I don’t have a lot of opportunities to do it, I had one at the time this Sourcegraph feature was still in beta and available for free to try even on the single user setup. Again, it was a simple thing, but it saved probably an hour of my time.
While you can tell I enjoy the product - a single docker container is there to try the product out, not to have a highly-available experience. At one point, the in-memory search index consumed more than 50% of my RAM, so I could not run the service constantly in the background. On the other hand, it takes less than a minute to start. Trade-offs. There was not a day I would not wish we had a real instance, but unfortunately I was not the one making the decisions. I hope this post convinces someone else what a great product Sourcegraph is. And while GitHub code search feature is quite new, I think Sourcegraph is an absolute leader in this, and with all the other features they added in recent years - they are well ahead.
Culture
Interestingly, Sourcegraph is not only an open source Their internal processes and culture documentation is also publicly available.
I was casually browsing the pages about engineer onboarding and manager resources. Turns out as part of onboarding you have to read two books and discuss them. I like reading books, so I took the recommendation. The same books are now on my recommended reading list too.
The first one is called Turn The Ship Around - a story about how the worst performing nuclear US Navy submarine crew became the best (and by a vast margin) within a year. The author walks the reader through that transformation and breaks it down, so that it can be applied in a business environment.
I also like to experiment with ideas, and so I did try and apply some of the ideas from the book within my own team. And, on the feedback I got, and the results we achieved, it was an objective success. The team engages and owns more decisions, we shiped more impactful stuff. It helped me redefine my own role in the team.
The other book - Orbiting a Giant Hairball - is hard to describe. Written tongue in cheek and but in a non-pretentious way (and if you can get a hardback version
- it’s absolutely worth it for the doodles) it talks about fostering creativity in a bureaucratic environment (think - medium & big companies). Personally, my favourite story was the one about a conference organiser who assembled a great team who had done everything and she didn’t need to flip a finger. It gave me food for thought about what it means to be a productive leader. Is it important to be seen leading or for the results to be visible?
It still baffles me that a big impact on my tech lead & managerial philosophy were books I found by browsing a website of a product I really like. When the product has nothing to do with people management. It is just that their culture documents are open. I think there should be more of this.
As my good friend says: “now, let’s get to the gravy.” Hope it helps you navigate your codebases!
Instructions
Limitations:
- Single machine, single user - everything runs on your laptop.
- Not all extensions work (e.g. Sentry doesn’t support custom domain)
- Can terminate occasionally (although in practice that happened maybe twice in two weeks for me, so it’s not annoying - you just restart the container)
- This has been tested on macOs, but should not be much different on Linux.
Prerequisites: docker
and docker compose
.
Steps
Adjusted from the official quick-start guide:
- Clone this repository
- If you want most recent version of the docker container, update the image tag in docker compose file to the most recent tag here.
- Run
docker-compose up
- this will create some defaults in~/.sourcegraph/
directory, we’ll adjust them later. - Go to https://localhost:7080 and follow the instructions to go through the setup process, create the admin account, and import repos.
Now you have the basics set-up and that’s a good starting point. However, a lot of power comes from various browser and code editor extensions. They require SSL, so I recommend setting that up.
Setup SSL
Setting up SSL is not necessary, unless you want to use the browser
extension or the
command line tool. The extension can
give additional information while browsing code, whereas the src
command can
help query your Sourcegraph instance from the command line.
The instructions are adapted from the official Sourcegraph self-signed
SSL
documentation page. It adds a fake root CA so that the self-signed certificate
can be validated. This is required for a command line tool to work - as it
validates the sourcegraph.local
certificate chain.
docker compose stop
- if it was running.brew install mkcert
, installmkcert
- an abstraction over OpenSSL written by Filippo Valsorda, a cryptographer working at Google on the Go team.sudo CAROOT=~/.sourcegraph/config mkcert -install
- create a root CA.Create the certificate:
sudo CAROOT=~/.sourcegraph/config mkcert \ -cert-file ~/.sourcegraph/config/localhost.crt \ -key-file ~/.sourcegraph/config/localhost.key \ sourcegraph.local
Allow your user (and sourcegraph container) to read the certificates:
sudo chown $USER ~/.sourcegraph/config/root* ~/.sourcegraph/config/localhost.*
Update the
~/.sourcegraph/config/nginx.conf
file to look similar to the one hereUpdate
/etc/hosts
file with the following line127.0.0.1 sourcegraph.local
Run
docker compose up
Open https://sourcegraph.local:7443/site-admin/configuration
Update the site-wide configuration to look like the
site-configuration.js
one in the repository.
Troubleshooting
There may be some errors in the logs, but as long as the system seems to be working - they can be ignored. However, if it does not start / is not accessible…
Database Connection Errors (e.g. Setting up postgres failed, database “sourcegraph” does not exist)
Run commands:
rm -rf ~/.sourcegraph
docker compose up
docker compose rm
This will delete all local files that Sourcegraph has created in
~/.sourcegraph/
. If the database files were corrupted - this will start things
from scratch.
Socket timeout errors (especially from Docker)
On my machine, Docker for Mac sometimes loses track of containers - and is unable to run them, or they become non-responsive. This happens only if I run Sourcegraph together with multiple instances of other containers. I just restart Docker for Mac and then start Sourcegraph container again - it seems to help.
The Good Stuff
Link GitHub.com
By explicitly listing public github.com repositories to clone, your local
Sourcegraph can index various dependencies, so that a search for HttpRequest
does not stop at from django.http import HttpRequest
, but you can investigate
it deeper.
- Go to https://sourcegraph.local:7443/site-admin/external-services/new
- Choose GitHub and follow the instructions to generate a token.
- Update and adjust the configuration to look similar to the
github-repositories.js
example here.
Sourcegraph Extensions
Extensions are enabled by visiting https://sourcegraph.local:7443/extensions and toggling them on. Here are ones I tried:
codecov
- I could not make AJAX requests pass Cloudflare Accesssentry
- currently supports ‘sentry.io’ domain only.git-extras
- can show blame inline (or on hover), a bit more convenient than just a sidebar.vscode-extras
- opens the file you’re viewing in Sourcegraph - in VSCode. Very useful! Requires setup, see user settings.
VSCode Extension
- Find extension
sourcegraph.sourcegraph
and install it. - Add
"sourcegraph.url": "https://sourcegraph.local:7443"
to your VSCode settings file.
This extension does not replace code intelligence, instead it allows users to
select an identifier in VSCode and execute “Sourcegraph: Search” action from
VSCode command palette. It opens the Sourcegraph search results for that
identifier in a new browser tab. Combined with vscode-extras
Sourcegraph
extension this is actually fantastic, as you can seamlessly navigate between the
browser and the editor as you explore the codebase.
Chrome Shortcut
Following the
instructions
allows you to have a shortcut in browser (I use sg
). For example, if I want to
search something: open a new tab and type in the address bar: sg file:requirements.txt django==
and it will find all requirements files with
django dependency, where you can see a list of versions being used by different
repositories.
Command Line Tool
src-cli allows you to run search queries from the terminal + some extra stuff.