I was asked recently by a friend of mine about what does my “standard” day of work consist of at Netflix. I had to explain to him that it’s hard to talk about a “standard” day as each day sees me looking at different pieces of our infrastructure and requires different challenges to be solved. Still though, I explained that there are a few common denominators throughout a day in Netflix — and in fact throughout a day of work for anyone who works in software development. And one major such common denominator, as I explained to him, is the fact that we, software engineers (“coders” as we are often labelled), spend lots of ours a day looking at lines of code and producing lines of such code. Then testing it, checking if it works, tweaking it a bit, trying it again and so on until we reach perfection or code nirvana At this point we are happy with our work and ready to move onto the next task. (Which more often than not is to deploy that code onto servers, but perhaps I’ll talk about that bit in a separate blog post. For now, I will just concentrate on the fact that we spend a lot of our time in front of countless lines of code — written by us or someone else.)
“Do you guys not get bored?” asked my friend, having listened to my talking about this.
I wrote before about what I think of handwriting — I think this is a dying communication form. And this is in favour of visual communication. See my previous post on whether handwriting is something I need to keep in my brain or not as well, where I was talking about the fact that one picture can actually communicate in millisecond time the same amount of information — without my brain having to give all the commands to my hands to “encode” the message in writing then send a text/email to my friend which then has to engage his brain to decode the letters, assemble the phrase then decode the phrase again and finally interpret the meaning of it; a single image encodes all of that and making it visual makes it much easier for the brain to decode.
This has found me obsessed lately about whether it’s just me thinking this way or is it actually the case that this is happening? And just when i start thinking that maybe it’s just a silly thought of mine, something like this happens:
I have worked recently in Netflix on a project which was hitting one of our Cassandra clusters. (By the way, we use Cassandra here a lot, wherever possible we prefer it to RDBMS, so we got tons of instances running Cassandra.) Part of what my code had to do was to retrieve a set of records and apply some transformation to one field then write the result in an output file. It is such a simple ETL that I haven’t spent too much time on this initially and simply wrote a code which ran a CQL (Cassandra Query Language) to retrieve the fields that I needed and apply the processing and write the output file line by line.
Of course, in doing so, I missed one important aspect: the volume of data (ouch!) This ETL is set to process about 100 million records and even though my code makes sure I only retrieve the columns that I want and not the full row (which would flood the network with a whole bunch of Cassandra columns for which I have no usage!) — it still dragged like a snail when I ran it first time! (I did a quick calculation at the time and it would have taken something like 3-4 days to finish — ouch!!)
Ok, so if you haven’t been watching my activity on GitHub you might have missed this, and as such I feel it deserves a full on blog post. Recently, having joined Netflix, I started using some of their libraries, as to be expected. One of the things that I used pretty much from day one here, was the Genie library. To quote from Genie’s page on GitHub:
Genie is a federated job execution engine developed by Netflix. Genie provides REST-ful APIs to run a variety of big data jobs like Hadoop, Pig, Hive, Presto, Sqoop and more. It also provides APIs for managing many distributed processing cluster configurations and the commands and applications which run on them.
As you can probably figure out from the above, I’m using Genie for querying some of our Hive datastores. And in doing so, I’m using the Genie client code which Netflix provides with this package — available in Github: https://github.com/Netflix/genie/tree/develop/genie-client
However, having looked at the sample code they provided I realised this can be actually improved. I spoke with the folks here who are looking after the Genie project and it transpired quickly that indeed the client library is in need of some lovin’. So I set off and put together a pull request (https://github.com/Netflix/genie/pull/116). This has now been merged into the main trunk however I think it needs a bit of attention as I’ve seen code presented in this project used elsewhere which can be improved based on the changes I put together in that pull request. This blog post will walk you quickly through these changes — if you are using pieces of code from the client’s code in GitHub, it might be worth reviewing your code and see if my changes can be applied in your project too.
I have started using recently Gradle, which I have to confess I actually find to be a bliss compared to Maven. Maybe because I prefer a Groovy-based syntax for build configuration, rather than Maven’s XML-based configuration file. Or maybe because I feel somehow the Gradle peeps have made the tool a bit easier to use than Maven. Or maybe because the integration with IDE’s seems to be cleaner. And I could probably go on, but you get the idea: after the initial playing with this, I’m digging it
And while using Gradle, here at Netflix, one of the things I started looking at is how to add a bit more automated defect detection to our code base in Ads Engineering, in order to improve our code quality. Of course, for those of you who are familiar with code coverage tools and the likes, Checkstyle, FindBugs and so on springs to mind. And all of these tools have (rather nice!) Gradle plugins — which makes this task a bit of a breeze!
I wrote before about how bad occasionally mobile advertising gets nowadays (see this blog post here). That was about 6 months ago — and I was hoping things started to change in the meanwhile. (After all, in the Valley we hear a lot about how quickly the world changes, right? :D) Well, it appears not — some companies are still stuck in the silly mentality of using in-app advertising to just annoy the hell out of the user until they buy some premium service. And today I had just encounter another case of really really bad advertising.
This one is extremely bad because it’s not just nagware where you get shoved advertising in your face at each step until you pay for a premium service. (And by the way, that in itself makes a really bad case for advertising because it is actually stating upfront to user that “look, we know advertising doesn’t work — soooo not true by the way! — and we don’t really value it either but we know you hate it and we’ll keep shoving it in your face not because we hope you might be interested in the products we advertise, but because we hope you will get so annoyed with it to buy our premium service and get rid of it”. In other words, advertising is not used to trigger user interest in other products or services, but instead just to annoy the s#@$t out of the user.) No, in this particular case advertising actually prevents the user from using the application!
This is another speech I’ve given at Valley Toastmasters and it’s part of the Competent Communicator manual, project #10: “Inspire Your Audience”.
(The speech is actually based on a Ted talk given by Terry Moore titled “Why is ‘x’ the unknown” and you can watch it here: http://www.ted.com/talks/terry_moore_why_is_x_the_unknown )
As a reminder, the objectives of this project are:
With that in mind, here’s the speech I delivered.