Jay's Blog

Thursday, 30 August 2018

New Website!

You can find me at my new website over at https://www.jaybosamiya.com/.

At the moment, my plan is to maintain work/research/security/etc related blogposts on that website. I will not be removing those blogposts from this blogspot website, but will no longer be publishing those here. Go over there to all my latest blogposts, and a regularly maintained list of ways to contact me.

Thursday, 31 May 2018

My Awesome Experience at the Summer School on Formal Techniques (SSFT'18)

I spent the past week, at the Summer School on Formal Techniques, and it was an absolutely amazing experience. Chronicled below, are the different great talks/labs that were part of this week long program, as well as my thoughts interspersed in. Overall, this was an unforgettable week, where I learnt a lot, made new friends, and had some nice discussions about a lot of very interesting topics. I also found out about some things that I had no idea about before, and thus I decided to write at least a short summary down for each of the different things that happened, so that maybe it might be useful to some (or more likely me from the future) as references for things to read up on.

Group Photo at SSFT'18

I initially heard about SSFT via a message on our Project Everest Slack. After all, two of our team (Nikhil Swamy and Jonathan Protzenko) were going to give talks and conduct tutorials at this summer school. Since I had only recently (approximately a year ago) started my journey into Formal Methods/Techniques, this seemed like the perfect opportunity to widen my field of view in this field- a field which seems to be able to cover almost every topic known in Computer Science. As a person who started from a more "systems-y" background, I (rightly) thought that this would give me the requisite theoretical background to dive deeper into this field, in addition to giving me practical hands on experience with other tools and techniques used in this field. I'd been using F* for almost a year by now, but had yet to have had a chance to mess around with other tools that are commonly used (except for Dafny, which is a beautiful language to start off in this field btw). The fact that approximately half the time was dedicated to the labs, surely was a bonus!

The Summer School was held at Menlo College in Atherton, California, from Sat May 19 to Fri May 25, 2018. I must thank SRI for organizing it, and NSF for funding our travel/stay/etc. The organization of the entire summer school, as well as how everything was managed, was absolutely impeccable. Oh, and a special shout-out goes to the chefs at Menlo College who made absolutely marvelous food that kept us powered up through this week long intense mental workout!

As for the actual summer school, it started off with an optional background course, titled "Speaking Logic", which ran over the first two days (i.e. Sat/Sun). Despite it being optional, it seems like most of us at the summer school attended it, and it definitely was worth it. N Shankar took us through a beautiful journey through the fundamentals and principles of logic and formal methods. Along this journey, we made stops along Naive Set Theory, Propositional Logic, Cook's Theorem, Reductions to SAT, Proof Systems, Minimal Logic, and Meta-Theorems. We also stopped off at Interpolation, Resolution, First Order Logic, SMT Solving, the Tarski-Knaster Theorem, and Bounded Model Checking. Alongside these (already amazing) topics were the awesome detours that Shankar took, with his own anecdotes and stories. A bunch of these even led to some intense discussions during and after the lectures! We did do a bunch of proving things in PVS, and it definitely seems like a great tool to get people started with some serious formal proofs :)

After this deep dive into Logic, Stéphane Graham-Lengrand took us on an adventure through Type Theory, where the Proofs-as-Programs paradigm reigns king. Type Theory is the foundation for a wide variety of interactive theorem provers, such as Coq, Agda, Lean, Matita, etc. During this talk, we went over the Lambda Calculus and Simple Types, Intuitionistic logic (and Constructivism), Computing with Intuitionistic proofs, HOL with proof-terms, Dependent types, and more, all building up to the final bomb that blew our minds- Homotopy Type Theory. Stéphane used an extremely well thought out analogy, using a Vacuum Cleaner Power Cord, to explain the basics and motivation for Homotopy Type Theory (HoTT). I'd always wondered why people were excited when talking about equalities, but with just this one example, Stéphane has ensured that I am definitely going to dive into this topic, much deeper, sometime soon! BTW, this entire talk was punctuated with proofs done in Coq, and the exercises for the same, can be found here.

As day 2 came to an end, some of us realized that we'd covered a lot of ground, and that we'd need some time to digest it, so we went back over our notes and the slides, trying to ensure that we'd actually got some of the details straight. BTW, the speaking logic slides are not only available online, but are also quite detailed, and I would recommend the interested reader to take a look at these to get an overview of what logic and proofs and formal methods are about. However, the slides do not include all of those amazing anecdotes etc, that were there during the actual talk, so maybe you should consider joining the summer school when it happens next? ;)

Starting from Day 3 (i.e. Monday), we had a change of pace, with the schedule changing to a bunch of talks until mid-afternoon, followed by a couple of lab-sessions.

Monday's talks started with a wonderful talk by Emina Torlak, about Rosette. The talk, titled "Solver-Aided Programming" was about a programming model that integrates solvers into the language, which provides constructs for program verification, synthesis, and more. A strong emphasis was placed on the paradigm of "Verify, Debug, Solve, Synthesize", which are implemented in Rosette as powerful user-friendly constructs. Since Rosette is built on top of Racket, we get all the benefits of Racket (and thus, all the LISP goodness), which leads to a very elegant way to program using solvers. The four parts of the paradigm can be thought of as the following queries that any user might want to ask a language: (1) Verify: "find me an input on which the program fails, or prove that it cannot fail", (2) Debug: "localize the bad parts of the program", (3) Solve: "Find values that repair this failing run", and (4) Synthesize: "Find code that repairs the program".

The next talk was by Nikhil Swamy, about F*. It explained how there is a gap between interactive proof assistants (such as Coq) and semi-automated verifiers of imperative programs (such as Dafny), and that F* was about bridging that gap. F* has an ML-like syntax, and has dependent typing. What this means is that verification is done via the process of type checking. The rest of the lecture continued with examples showing (simple) proofs written in F*, for functional programs, as well as a discussion on how we can write proofs for effectful code, using Monadic effects, and by modeling the heap. Personally, despite having worked with F* for almost a year now, I still gained some new insights, such as the fact that subtyping being possible due to the core feature of refinements being proof irrelevant after finishing the proof.

Next up, after lunch, was a talk by the unforgettably energetic Mooly Sagiv, titled "Modularity for decidability of deductive verification with applications to distributed systems". It took us through the motivation of why decidability was crucial for verification, and that despite seemingly restrictive (since the logic then becomes less expressive), we are able to talk about and prove whatever interesting properties that are needed, at least in the case of distributed systems. Specifically, this decidable deductive verification was demonstrated using Ivy. The basic idea behind this is that when a verifier goes into the "Unknown" state (i.e., it is divergent; aka "I can't decide"), this gives very little (if any) information back to the person working with the verifier, since the property being verified might either be true, or might have a counter-model (read: counter-example), but no further information is known. Instead however, if we were to restrict ourselves to a logic that is always decidable, then we are guaranteed never to reach the "unknown" case. The talk then went on to some examples of verification in this decidable world (and how to represent some things that are "not expressible", but turn out to be expressible when you look at them from a different perspective). What was really interesting was how many fewer lines of proof were required per line of code in Ivy, in comparison with other tools. More details about how it is able to do so, would be in the following lecture.

After a short break, we had the lab sessions, where we split off into 2 groups (using a great way to introduce randomness btw: if the first letters of both your first and last name lie in the same half of the alphabet, then you belong to group A, else group B). All the groups had all the lab sessions, just in a different order; since I was in Group A, my order shall be based off of that.

Our first lab session was by Emina Torlak, where we got to dive into working with Rosette. As a warmup, we worked on finding and fixing a bug in a tiny BitVector example. As a larger example, we worked on Sudoku. Starting only with a checker (i.e., a program that, given a Sudoku solution, results in a "yes it is a valid solution" or "no it is not"), we turned it into a Sudoku solver (find a solution, given a puzzle), a validity checker (checks if a Sudoku puzzle has exactly 1 solution), and minimal puzzle generator (a puzzle that has the least number of constraints needed to be valid). What was extremely interesting was how easy it was to build each of these using just the checker. Rather than gain domain knowledge about how to write solvers, generators etc., we simply piggy-backed on the already implemented checker, and with only a single-digit number of lines of code, were able to implement all of this! Unfortunately, I do not know of a public link where this tutorial is available, but it is a very well designed and thought out tutorial that I'd recommend people try if they can find it.

Next up was the lab session by Mooly Sagiv, about Ivy. Here, he gave a live demonstration of using Ivy to verify a protocol for mutual exclusion. In contrast with previous experiences with verification, this was a welcome change, where the tool itself gives you a graphical counter-example, when it is unable to prove that a certain invariant holds. While it still falls upon the user of the tool to provide stronger invariants that can be proven via the induction, being able to see these counter examples instead of simply a "timed-out" as is more likely in other tools, is definitely a game changer

The next day (Tuesday), we started off with a talk by Andreas Abel, about Agda- the dependently typed functional programming language which is a proof assistant. The material for what was covered can be found here. The talk had a strong "do this live" approach, where Andreas was explaining things as he continued working on Agda code, proving things. We also went over an elegant representation and ordering invariant for binary search trees in Agda. This was based off of Conor McBride's paper "How to keep your neighbours in order". Personally, I found that Agda had a very elegant interface which made it more natural to think about proofs, and this made understanding the proofs themselves a lot easier.

The next talk was by Emina Torlak. Yes, almost every speaker gave 2 talks and had 2 lab sessions. This time, the talk was about how to build a solver aided language. The classic (read: hard) way to build such a tool is to build a symbolic compiler, which requires an expertise in Programming Languages, Formal Methods, and Software Engineering. A much easier method would be to build an interpreter for the language, and have something that uses this to build all our tools for us. This kind of tool is a significant technical challenge and is what Rosette solves for us. We can simply have a (deep or shallow) DSL hosted in Rosette (where, since Rosette is built on top of Racket, this becomes very easy), and then we can easily build the 4 tools- verify, debug, solve, and synthesize, very easily. The talk then went into details about how this massive technical hurdle is overcome (since neither symbolic execution, nor bounded model checking, are up to the exact requirements of this task of precise symbolic encodings). It is done via type-driven state-merging. This was followed by 3 different use-cases (out of many many others) where Rosette has been used to get some very interesting results.

The last talk before the break before the labs, was by Jonathan Protzenko, titled "Verified low-level programming embedded in F*". Specifically, the talk was about a low-level subset of F*, named Low*, which allows one to write and reason about low-level C code. The talk goes into detail about how different parts of C are modeled in Low*, as well as talks about the kreMLin compiler, which compiles Low* code to readable C. Since proof-erasure is a part of this process of compiling down to C (because C has no notion of proofs by itself), we only need to use the low-level subset in the actual computation bits, and can use the full power of F* in the proofs. There is a tutorial for Low* available online (currently a work-in-progress).

The first of the labs on the same day was part 2 of Emina Torlak's lab. In this, we worked on actually building DSLs- first a shallowly embedded DSL, followed by a deep embedded DSL, for a circuit-based programming language. An important thing to note here is that shallow DSLs are faster and easier to implement, and are almost always the right choice for either the "verify" or the "debug" operations. However, if we want any sort of "synthesize" operation, then shallow DSLs fall short extremely quickly, and deep embeddings work out much better.

The second lab on Tuesday was part 1 of the F*/Low* Lab, conducted by Nikhil Swamy. Here, we went across some select examples from the F* Tutorial. Special emphasis was placed on how one comes up with these proofs, and how F* is able to infer most of the proof, requiring only a "here's the form of the induction" argument. One interesting point that came up during this discussion was the possibility for F* to automatically "guess" the form of the induction. This seems quite doable in a large number of cases, and is something that is worth considering, though F* doesn't support this at the moment, and one needs to explicitly point out things like "do an induction on this list" (and the rest of the proof is automatic). The tutorial is well written and thought out in a way to be self-contained, while not repeating concepts that might be known to an OCaml / F# developer. There are some nice motivating examples/stories in the tutorial online too. I would definitely recommend the interested reader to try these out (though some people might need to take a crash course on OCaml / SML / F# syntax, if they aren't used to it).

With this, Tuesday came to a close, and we started Wednesday with a talk by Dirk Beyer, about CPAChecker. The talk, titled "Configurable Software Model Checking - A Unifying View" was about how one can unify different techniques, ideas, and algorithms in program analysis and verification, into a single framework, that also allows for the creation of "intermediate" algorithms. This ability to configure allows one to move all the way between imprecise but scalable Data-flow analysis, all the way to the precise but expensive Model Checking, by better combination of abstractions. Via Dynamic Precision Adjustment, one gets a better fine tuning of abstractions and gets adjustable precision, thereby allowing a similar movement between precise/expensive and imprecise/scalable. By using an adjustable block encoding, one can change how many statements are handled together all at once. The rest of the talk revolved around CPAChecker's features, concepts, ideas, algorithms, as well as architecture. We'd get to play around with it in the lab soon! Oh, and btw, CPAChecker has regularly been doing well in the SV-COMP Competition on Software Verification, coming in with Gold in 2018!

Next up, part 2 of Andreas Abel's talk. He continued with the "doing it live" approach, but this time, we were looking at examples from programming languages, like representation of expressions, evaluation, and equational reasoning. Again, the files can be found here. In my opinion, understanding these things well definitely involves following along, proving these things in Agda, so I would 100% recommend interested readers take a look at the Lec2.zip file on the website, and try proving stuff. The comments in these files are really helpful, especially if you've already gone over the files from Lecture 1. I have to specially point out how amazing it is that time was put in to make sure these examples were extremely well documented, and is thus easy enough for a beginner to follow along. Being at the talk though helped a lot too, and probably saved a bunch of time for many of us in understanding these ideas.

And then right after lunch, part 2 of Mooly Sagiv's talk. This time, the talk was much more about the technical details of Ivy and its 3 most important principles: (1) What 1^st order structures would exist in the language -- abstract states and imperative updates. This gives us a "step towards decidability". (2) Theories as add-ons. When the user axiomatizes domain knowledge in EPR (which is what gives the decidability), soundness is checked, and we get reusable domain knowledge with predictable automation. (3) Modularity for breaking quantifier alternation cycles. It falls upon the user to break these cycles, and Ivy will only point out that there are cycles, instead of trying to break them by itself. One very interesting idea that falls out of this is how sometimes it becomes useful to perform the abstraction of functions as relations instead.

As for the labs for Wednesday, we started off with the first of the Agda labs, by Andreas Abel. In this lab, we worked on some simple definitions and proofs in Agda. I'd strongly recommend going over the exercises which are quite self-contained, and have some interesting bits that one might get stuck upon before realizing where to go next.

The next lab was the first of the CPAChecker labs, by Dirk Beyer. Here, we followed along parts of the CPAChecker Tutorial, which is also very self-contained. It contains a bunch of nicely chosen examples which help in identifying different features of CPAChecker, and has a nice progression to it.

And with that, Wednesday draws to an end. By now, we've all had a lot of interesting ideas, works, thoughts, anecdotes, stories, and concepts explained to us- and while for some, it might have seemed like an overwhelming amount, to most of us (or at least to me), this was a treasure trove of knowledge in a short, condensed, concentrated form. Personally, I was having an amazing time here. I must also bring up the fact that the rest of the students at the summer school were also a major factor in this, since everyone was brimming with ideas, and worked on such varying topics, that no matter who you were talking to, there was always something amazing to learn.

Anyways, back to the talks- we are now on Thursday. We start the day with a talk titled "Verifying Properties of Binarized Deep Neural Networks", by Nina Narodytska. With Machine Learning (and more specifically, Deep Neural Networks) becoming the rage and the norm for a lot of industries, it becomes essential to actually try to understand properties about them, especially about robustness to perturbation. In this talk, a specific class of neural networks is taken and is studied via an encoding to Boolean Satisfiability (i.e. SAT). Once it is encoded at a SAT problem, one can then leverage the full-power of research done in making better SAT solvers, to be able to scalably verify properties of these binarized deep neural networks. An arXiv pre-print of the work can be found here.

Next up was the second talk by Dirk Beyer, which was split into 4 parts (again, the material can be found here). One of the parts was about how one might combine different verifiers etc. Currently, we have a lot of verifiers, due to competitions such as SV-COMP, but it would be great to be able to leverage the strengths of one to help the other. This is where conditional model checking comes in. A really elegant "reducer" based construction was shown, which allows one to basically "import" any verifier and make it use the results of a previous conditional verifier. Another part was regarding verification with witnesses, which was about witness validation, as well as stepwise refinement of witnesses. This is another case where multiple verifiers could work in conjunction to either aid each other, or to provide higher assurance in each other's results. Another idea was about execution based validation of witnesses- this one is especially important when talking to people outside of the verification community, since they respond best to "here's a test case that breaks the software" rather than "here's a set of paths which might lead to a break in the software".

Finally, the last of the talks of the day was by Gordon Plotkin. This talk, titled "Some Principles of Differentiable Programming Languages" was an absolute beauty! In this talk, he walked us through what Differentiable Programming Languages are, and why they are necessary (as well as difficult). He then went on to explain some previous foundational work that might have been useful, if it weren't for those pesky partial functions. Following along, he went along his thought process in how he worked on designing the right language which has differentiation as a fundamental operation in it (despite having conditionals and looping). Personally, I found it extremely fascinating to see the thought process of reaching such an elegant language in the end. It starts off extremely messy, and keeps getting messier until suddenly, beauty emerges at the end. There are a whole bunch of notes that I wrote down during this talk, but I believe nothing can do justice to summarizing this talk. It was filled with lots of interesting side-notes, and anecdotes, and ideas that themselves could be talks of their own.

Back to the labs: we started Thursday's lab session with part 2 of the Ivy lab, by Mooly Sagiv. At this time, we got to actually play around with and truly understand the though process behind coming up with inductive invariants while proving things in Ivy. We specifically took up the example of leader election in a ring, and tried to prove that at the end of the protocol, exactly one leader would be elected. Due to Ivy's fast turn-around time to respond to the written invariants, as well as its relatively easy to understand graphical representation of counter examples, it became much more about identifying stronger and stronger properties that might help us in proving the property we want, rather than fighting with the prover, which seems to happen in a lot of automated proof systems. At the end of finding the proof, we also end up having a very nice birds eye view of the proof, since we've written these nice inductive invariants, by the time we are done.

Next lab for the day: part 2 of the F*/Low* Lab, conducted by Jonathan Protzenko. Here, we were proving correctness and some properties of some short examples about working with machine integers, references, and buffers. The code which we were working on, is heavily commented, and is extremely easy to follow along after the F* tutorial. I personally would recommend doing it, to quickly get a handle on the basics of Low*. These examples however, should soon become a part of the (currently work in progress) Low* Tutorial.

Next up, the Banquet. Everyone had an awesome time at this. And here's a picture of almost everyone who was there (looks like some people are missing from the photo, probably because the photo was taken very close to end of day):

At the Banquet

Finally, we arrive to the last day of the summer school (did a whole week go by so fast?!). We start off the day with 2 lab sessions (yep, labs instead of talks at the start of the day).

The first was part 2 of the CPAChecker lab, by Dirk Beyer. Here, we continued with the tutorial, but this time instead, we concentrated on the parts about combining verifiers, and about witness generation/checking. Personally, I found the reducer generating (arguably) readable code, which was interesting (and unexpected). Overall though, this lab helped cement the ideas that were discussed in the second talk by Dirk.

The second was part 2 of the Agda lab, by Andreas Abel. This time, we looked over more definitions and proofs in Agda (see Exercises2.agda). In the course of this, we also ended up learning about various proof styles that are possible in Agda, especially when talking about auxiliary "helper" lemmas. Proving decidability of various things was a nice exercise and is something I'd definitely recommend trying.

As a fitting end to the summer school, was a talk by Edward A. Lee titled "What Good are Formal Methods?". Based loosely around his book, titled "Plato and the Nerd", he walked us through the nuances of how combining different deterministic systems can lead to non-determinism, and how moving between the viewpoints of a scientist and an engineer is essential in looking at the right way at models. Everything we say about, and prove about systems, is always actually about a model of the system, and depending on the viewpoint, either the model is flawed (scientist), or the system/realization is flawed (engineer). He then goes on to talk about non-falsifiable theories, with the "Digital Physics" hypothesis taken as a prime example. another interesting direction was about the Incompleteness of Determinism, which he shows via the concept of "Superdense Time". This talk was definitely an amazing journey through a lot of different ideas and concepts from a variety of different fields, that we do not even think about on a regular basis. I am definitely going to read the book, because if it is anything like the talk, it should definitely be a joy to read!

And finally, we come to the end of this brilliant, beautiful and amazing week. I got a chance to meet such awesome people, discuss mind-shattering ideas, talk about random topics in great depth, and make some great new friends and acquaintances. I hope to stay in touch with as many of you as I can, and hope to meet again sometime really soon!

Tuesday, 4 July 2017

A Story to Tell : My Experience at IIT Roorkee

[This was originally published on Expectations IITR, run by Geek Gazette, just after completing my B.Tech. at IIT Roorkee]

My desk is definitely not this organized

Truly, time flies extremely fast when you are enjoying yourself. These 4 years have been amongst the most memorable ones in my life till date, and have helped me grow as an individual in countless ways. First year was the first time I was staying away from home, for an extended period of time. To make things more challenging, I was at the same school for the previous 12 years! To come to a new place, and not a single familiar face to be found on campus, I definitely was anxious before arriving on campus. But my fears were unfounded, and I quickly found out, that in a campus with thousands of people, there are bound to be those who think alike, and we tend to find each other as time passes.

Amongst my neighbours and branch-mates, I found friends whom I could go for trips and treks with. And since I had a massive collection of movies and TV shows that I'd brought from home on my laptop, frequent marathons were bound to happen. At the time, RJB didn't have a good internet connection (NoLAN, as we liked to call it), and so we ended up purchasing an Ethernet switch, and long LAN cables, so that we could all play Counter Strike. Even then, it was often more fun to just move chairs into a single room, and continuously curse while shooting in either de_dust2 or $1000$.

Soon enough after joining IITR, different campus groups started their recruitment sessions. After asking around, doing some reading, and understanding what the different groups do, I decided to apply to only two -- Geek Gazette and SDS PAG. GG was a place for the geeks of the campus to unite, and try to change the world, one bit at a time. As I had tried writing short stories in the past, I applied as an editor to the magazine. It was during the interviews that I realised, that I had found my kind -- people who are extremely (almost obsessively) passionate about gaining knowledge about the things they love, and like to share those things with the world. Over time though, for me, it grew into a family where I could implicitly trust everyone in it. Other than GG, I applied and got selected into SDS PAG, which seemed like an obvious choice for me, since I had already been doing competitive programming since class 11 (Informatics Olympiads etc). Here was a place I found people like me, who lived on the "adrenaline rush" of solving puzzles and challenges through the use of logic, algorithms, and code.

Academics, of course, went on, alongside all these amazing things, but overall, I didn't find it too much of a burden. As long as I stayed awake in class, I was able to manage great grades without needing to spend too much extra effort outside of class. And the great thing was, we only had ~25 contact hours a week, which meant that I could do a lot more than just merely academics. Hence, apart from all that I was doing, I decided to take a shot at open-source development. One of my seniors had said "doing a Google Summer of Code in first year is impossible", and I just couldn't accept this as true. After a few months of intense work, I got selected into, and that is how I spent most of my summer after 1st year (and the first couple of months of 2nd year) -- working on the open-source network security scanner, Nmap. It was an amazing experience, and I got to work with some of the most amazing developers and security enthusiasts. Even though it was not required, it was so much fun that I was spending over 70 hours a week!

In a few months though, once the GSoC period was complete, I realised how tiring pure development could be. Additionally, 2nd year had begun, and with it, came a whole bunch of new hobbies. This is when I first heard of CTFs -- a Capture The Flag contests, where many challenges are given to hack into, legally. This seemed like awesome fun, and with a few friends, I decided to try it out. Little did I know, at the time, how important this would turn out to be. Soon enough, we were starting to get relatively decent at this "new" field, and we started taking part in national level competitions -- our first, being Deloitte CCTC, where we went and won! The thrill of breaking into software was something my teammates and I got addicted to, and over time, it decided the fate of our careers. However, at the time, it was merely a hobby.

As a serious career prospect, I was considering research in Computer Science, but was unsure of the sub-field. One of the professor's course was really fun, and I decided to ask him for some research project work. Prof Bala suggested that I work with him, and Prof Partha (who had newly joined the department), on a topic in Image Processing and Computer Vision -- Scene Text Segmentation. This led me on a long journey of actually learning what research entails -- lots and lots of failed attempts, and that one final satisfying"Aha!" moment. It was great, and for the 2nd year summer, I decided to apply for foreign university internships in the field. Maybe it was bad timing, or maybe I needed to learn more, or maybe... -- whatever the reason -- I was met with mainly unanswered emails, and the occasional rejection. I took these rejections, at the time, as a way to have some extra focus on the project at hand. Over the course of the summer, with more improvements, I was able to obtain publishable results, and decided to write a research paper about it. It was accepted at the Asian Conference on Pattern Recognition, and I ended up travelling to Malaysia in the 3rd year to present it. That trip was amazing, and I still tend to use one of the photos clicked there as a profile picture on some websites.

As part of enjoying life though, I think my friends and I had matured by then, and had started going for longer treks, and started playing "serious" games such as AoE2. Of course, a trebuchet is the weapon of choice to lay a siege with, since it utilises a counterweight to launch a 90kg projectile over 300m! Obviously, it is superior to the measly catapult, but I digress.

By the time 3rd year rolled around, I was made the president of GG, and with that, came a whole slew of new experiences and responsibilities. I was able to lead an amazing team, with great friends by my side. Frankly, it was something I thoroughly enjoyed. It did take up basically all the time I had available, but I still managed to get a lot of other things done too. One of these, was to start up a new group on campus for security enthusiasts like me. This was the birth of SDS InfoSecIITR -- a group conducive to hackers, with the purpose of pushing IITR's security culture to new heights, mainly through conducting and taking part in CTFs.

While InfoSecIITR was still in its infancy, we were still trying to figure out what might be the best way forward. However, with the experience of leading GG, I had realised that we need to strike the right balance between planning and action, in order to get optimal results. Too much planning and it is a waste of time, and it is the same for unplanned action. With this in mind, we began to come up with a rudimentary plan of action with only a small number of members (though it was an open group and anyone could join). Over time though, as 4th year rolled by, it would grow by leaps and bounds (with over 100 first yearites showing up for some meetings), and we would have come up with a much more concrete plan of what to do next.

As we started taking part in more and more CTFs over the year, I realised that there were some teams, at an international level, that did consistently well. One of these is PPP -- a team from the Carnegie Mellon University, that has consistently won almost all the most difficult CTFs that happen each year. Looking into their structure (mainly to figure out how to help InfoSecIITR progress more), I found that it has a faculty advisor who does research in the same field -- software security. Wait, seriously?! You could take this hobby up as a full career? Even though I was reading papers in security before, I had suddenly found out that academia considered this a complete field! After a lot of reading of papers, thinking, and testing some ideas, I decided to apply for a summer research internship under Prof David Brumley -- the aforementioned faculty advisor.

After a bunch of emails back and forth, he was happy to invite me over to CMU for a fully funded summer internship. This intern was arguably the best time I've ever had while working, probably because it didn't feel like work at all. I was helping develop systems to augment humans in finding vulnerabilities in common software. Additionally, this is where I fell in love with using Emacs (it is better than Vim, but I digress again), started loving OCaml, and figured out that this field is probably what I want to continue in for the foreseeable future.

Once back in Roorkee, 4th year began, and with it, came the whole "tension" of applying for further studies. While batch-mates prepared for job interviews, I had decided that I wanted to go for a PhD. Taking up the standardised examinations, asking for letters of recommendation, writing a statement of purpose, updating my resume, filling out forms -- it does turn out to be a very time-consuming task. However, it indeed is a very rewarding task if done right, in two major ways -- it helps you really think about all you have done and what you want to do further in life, and it provides a sense of confidence in knowing that you can really achieve all that you wish to (since you've achieved a lot of what you originally set out to do, and even when goals changed, you've adapted and done well).

Managing time between all these, as well as the Bachelor Thesis Project (BTP), a whole semester flew by almost instantly; all the applications to the 5 institutes I had applied to were also sent. What now?! Then began the long and dreadful wait (at least, as described by others). The last semester of college was one where compared to the previous 7 semesters, I had very little to really do. Academics had gotten very chill, BTP was going fine, campus groups were being handled easy enough, I'd gone on multiple trips/treks, I was taking part in multiple contests, and yet I had time to spare. So I decided that I would spend my free time divided amongst 2 main things -- spending much more time with those I am close to, and helping juniors out on a larger scale.

Over the previous few years, I had formed extremely strong bonds with a lot of people on campus -- people I loved spending time with, whether they were juniors, seniors or batch-mates. However, each of us were busy with our own things until then. Now, with the little bit of extra free time on our hands, we could spend more quality time. Each moment spent definitely becomes more precious, as the time to go apart draws closer.

Apart from this, while I had already been helping juniors out in various ways, mainly through mentoring or just being there for advice when needed, I decided that I should start working on making a larger impact on the general crowd, and help move IITR's culture more towards research. Over the years, I had tried pushing for this culture in as many ways that I could, but now it was time to reach out to as many as possible, before I bid adieu. So, along with others who had done research interns, and others who were doing research, I took a few talks, and reached out to everyone who was interested (even casually) in considering research as a career possibility. Turns out, there are many who want to try things, but just don't know whom to approach. I realised that it became a moral duty for those of us who had been through the pains of starting off, without having any guidance, to actually guide those who wanted to pursue their dreams; and this was amongst the most fulfilling things I have ever done.

While all this was going on, the decision letters started to arrive from the different institutes, and I had gotten selected into 3 really amazing places -- MIT, UCSB, and CMU. And suddenly, I had a problem -- how can you really decide between 3 of the best places for the kind of research I wanted to do? Of course, getting selected into any of these is a matter of joy and pride, but getting into all 3 is a bit of a problem. Of course, it is a good kind of problem to have though. After much deliberation, I decided that I would go with CMU. And with that, began the farewells, of the different campus groups, as well as the department.

It doesn't hit that hard until you reach the point of the farewells, how much you have come to love the time spent over the 4 years. It is at this time that many tend to get emotional since, for some of us, it might be a very very long time until we meet again. Thankfully though, the tradition of dressing up as different characters (such as Jack Sparrow, or Davy Jones), does provide a bit of necessary relief from the otherwise extremely emotionally charged farewell.

However, even after all of this, Roorkee doesn't let you go. The final semester exams, and the final BTP evaluation happen after the farewells, and even after that comes up the extremely amazing tradition of the "no-dues". Most people really understand these pains only when they need to rush through campus, in the hot sweltering sun, just to get a few signatures on pieces of paper that are probably gonna be thrown away in almost no time. Yet there is a kind of fun in everyone cursing the same people and the tradition each year, standing in those long queues, waiting for those coloured slips of paper.

Soon enough though, even that is done, and it comes time to finally say goodbye, and it really breaks my heart to say goodbye. It was an amazing 4 years, and I am going to cherish these memories forever. A lot of the connections and friendships that I have made, cannot even be put into words, but I feel that those are what I am going to remember more than anything else. I believe it is the people, and not the place, that really grows on you.

If I have any advice for 1st yearites, it would be this: across the 4 years, not everything you do has visible benefit, and not everything will get to the resume. Not every part of it needs to be for a predestined goal. Nor does it need to be just smooth-going successes. Do not limit yourself to only academics, but neither let it suffer. You will stumble, and you will struggle, but that is part of the process. You just got to keep your head cool and keep moving forward, trying new things, finding new interests, building strong friendships, and enjoying your time at college. Because as long as you don't waste them, these 4 years are definitely going to become a story to tell.

Wednesday, 13 January 2016

Writeup for n00b16CTF

Ah! Year 2 of n00b ctf @ backdoor. Quite a nice and fun contest. Though this contest was for n00bs, I decided to take part again (just coz I won last time - find the writeup for those questions here). This time, I came second, since I couldn't solve 2 of the questions, and one other team (dcua) solved all except 1. However, a short while after the contest was done, I solved another. Now only the question titled EULA remains. Hopefully, I'll update this post with its writeup soon too :)

Here's a bunch of writeups (more like small hints) for each of the questions I solved:

Batman

Take a look at the URL as you click on the items. Maybe higher numbered items are interesting (even if they cannot be directly accessed). The Hackbar add-on for Firefox makes this very easy to go through, btw.

Bin-Easy

Okay, I didn't even run the binary. The answer was just lying there in one of the strings (hint hint) inside it directly. Pre-execution-analysis itself showed up the flag.

Bin-Medium

We can disassemble this executable and change the path of execution to either circumvent the check (through gdb) or by flipping one bit in the binary. Each of these would print the flag. Another thing is to try to work out how the flag is being made in the memory, but that's just wasting your time. Take the easy path :P

EULA

Still unsolved by me. However the Accept-Language HTTP parameter seems very interesting, especially with things like en, it etc.

Fool

A very high stake challenge (it had 150 points) but it a very short while, it was obvious it is a format string bug. After that it is trivial to attack it (read up on FSBs if it isn't). I made a stupid mistake though which took a couple of minutes to realize. I was sending a single quote at the beginning and at the end. Doh!

Frost

A multi byte xor cipher. My favourite tool for this (xortool.py) works directly.

Magic

The = at the end gives away that it is base64 encoded. Decoding it gives another cipher but the spacing makes it obvious that it must be some simple ancient cipher. Caesar would be proud.

Matrix

I couldn't solve this challenge during the contest, but solved it just after it, and damn, was I over thinking it. During the contest, I realized that the gs were too regular (every 7th place) so I split it into 32 lines of 6 characters each (after removing the gs at the end of each line). All these were hex digits, and with the 6 character thing, I naturally thought of colours (for example, #00ff00 is green). Lot of time was wasted in this direction, but after the contest and a small hint of transposing the matrix (i.e. making it into 6 rows of 32 characters each), it was obvious that each of those could be hashes. Online MD5 crackers then gave away the answers.

Rbash

Restricted bash - where cd wasn't allowed. Well, ls -l .. worked. And there was the flag right there to be cated to the screen. Later on, I found out that you could also execute shell inside this rbash and then you could do anything you wanted!

Robot

The name of this challenge gave away that we needed to look at robots.txt (that pesky little file that web crawlers are supposed to adhere to, but only lead to attackers finding things they shouldn't).

Wahtzdis

Whoa! That is a lot of data, but seems like the javascript console in the browser nicely handles it. Turns out this is the esoteric language JSFuck.

Did you find any other cool/new ways of solving any of these tasks? If so, leave a comment below.

Friday, 23 January 2015

Dracula Untold

A thrilling origins story that would make for a great video game.

Vlad is a Prince whose family is threatened by the Turks. In order to save his family and the people of his country, he must seek a power much greater than any human can have. But with that power comes a price much greater than a human should bear. He must struggle to keep his humanity and save his people.

Personally, I loved the story, but felt that the ending was such a huge cliff-hanger, it deserves a sequel. If there were to be a sequel, it will definitely be one that surpasses Dracula Untold. With amazing acting by every member of the cast, combined with a well written origin story (though hardcore Count Dracula fans would dismiss it), topped off with good directing and visual effects, this movie is definitely one to watch.

Some criticism is required however. A lot of the story line was given away in the trailer itself, so if you've seen the trailer in advance (as I did), you'll find that the movie only does some filling in of details (and the bad ass ending, which I have been going on about). This movie is not one to be seen multiple times. Also, it does feel like the movie could've been an awesome game instead. Imagine doing the bat changing thing! If a sequel were to be made however, which continued right where this one ended, I would definitely want to watch it in theaters.

So, what did you think of this movie? Leave your comments below.

PS: Turns out that a sequel might actually be done (executive producer Alissa Phillips revealed so).

Monday, 5 January 2015

Writeup for n00b15CTF

This is a collection of hints for all the problems in the recently conducted Capture The Flag (CTF) contest conducted by SDSLabs as a way to get n00bs (beginners) to have a taste of the beautiful world of hacking. It was a pretty fun contest even though it was quite easy.

Test

Use a SHA256 tool. My favourite is to just search for "SHA256 STRING_TO_BE_SHA256" on my default search engine (www.duckduckgo.com)

Location-51

There is a redirect occuring here from http://hack.bckdr.in/LOCATION-51/index.html to http://hack.bckdr.in/LOCATION-51/trap.html Stop this redirect, and read the source of index.html The javascript there gives away the flag

Hidden flag - Easy

Using the file linux command, we find out that it is an ELF binary, but running it gives nothing. However, running strings on it gives away the flag.

Search

The zip file contains a .txt file which does not seem normal text, so we run a file on it. This says that it is jpeg, so change it to .jpg and open it. It is a QR code. Decode this using some online tool (just search for "QR code decode online" for a large number of free tools) and get a link. The link has the flag.

Lost

The message says Console, so open up console in Firefox. The message then tells you to POST data to a link. Going to this link directly does nothing, but sending it any random POST data (using HackBar addon in Firefox for example) gives the flag.

Hidden flag - Medium

Analyzing the file with IDA Pro shows that there is a function called print_flags() which is not called inside main(). Running this function should print the flag. We can do this by attaching gdb to the binary, breaking the execution and running the print_flags() function.

Clutter

Extracting the file and analyzing with Wireshark shows that there is too much to work with. But exporting all the files and then running strings on it would probably work. However, filtering this is a pain, so I just ran a grep for flag and the answer will be visible near a pastebin title.

No - Signal

Use GIMP or Photoshop to add the images. The flag should be obvious then.

Sound

Slow down and reverse the sound wave using Audacity. Listen to it and it should be obvious what the flag is.

Sequel

Looking at the code, it seems like a SQL injection can be done here. Downloading the database.sdb file and rewriting the source code to start throwing data from database, you realize that there is no user sdslabs in the database. This makes it obvious that you need to add the user. The following username virtually that: ' UNION SELECT 'sdslabs','sdslabs','sdslabs','sdslabs','0c4ea8f5b344600f78516334254e9e085f2225a42a0bb18fa8bd774589f1ca19' UNION SELECT * FROM users WHERE '0'='1. Note that this query will not work directly, the password will have to be set accordingly.

Undisputed

The file is a ext4 filesystem (use file command if you don't trust the extension). Mount this in linux using the mount command (read man mount to know how) and then open the file inside to see the flag.

Did you find any other cool/new ways of solving any of these tasks? If so, leave a comment below.

Link to writeup for n00b16ctf

Sunday, 14 December 2014

How to Start Competitive Programming

If you've ever thought about competitive coding, all you've got to do is start now. Doesn't matter which class or branch you are in. Everyone can code and everyone can be competitive. Basically, all you've got to have is the right mindset and the right skill set.

Let's talk about the mindset first. Competitive programming requires a drastically different mindset when compared to "normal" (or developmental) programming. While in developmental programming you have to concentrate on writing quality code that is manageable, competitive programming generally involves writing code that "just works". This is because you'll probably never even look at your code again, once you get an AC (accepted) result from the grader. This does not mean that you should write sloppy code since that would just make it hard to debug (and this can be a real pain in the a** in a competitive setting, with the added pressure of time running out). All it means is that the code need not have extensive documentation (just some inter-spread comments to help out when debugging are enough) or modularity/objects/classes (just write it in logically separate functions to make life easier).

What about the skill set? Basically, the only requirement is that you should be able to write in the language that the grader accepts without having to look at the documentation for the basics. Since almost all graders support C++ (I haven't found any that don't, except in some "special language contests" etc), and C++ has a massively powerful library called STL (standard template library), C++ is the natural choice for most competitive coders. However, if Java is your mother tongue, a fairly large number of graders accept this as well, but do note that Java is known to be slightly slower and more cumbersome to write in a pressured setting. Java does have one major advantage over C++ and this lies in its BigInteger library. However, this is so rarely used that it isn't worth the hassle of doing everything in Java just for this. Most coders use C++ for everything except for when the question requires large integers (larger than 10^18 or so).

I've put together a small set of FAQs below and this should help. If you do have any more questions, do ask in the comments below.

Q: Which language should I use?
A: C++ (Read above for why)

Q: How should I learn C++?
A: The tutorials on cplusplus.com are extremely useful (http://www.cplusplus.com/doc/tutorial/). Not to mention, their documentation is in depth and is useful later on also, when learning things like STL.

Q: Any specifics of the language that are more useful?
A: C++ is a very powerful language with a gazillion features (and more added with C++11), but a whole large chunk of them are not necessary whatsoever for competitive programming. Basically, you should be comfortable with variable types, I/O (using cin, cout - there are faster ways but those are usually not necessary unless a 0.001 second improvement is the only thing that's causing a TLE - usually, the algorithm can be improved instead), control statements (if, ?: operator), looping (for and while, break and continue), functions (call by reference and call by value), arrays, strings (as character arrays), struct. Once you're comfortable with these, learn how to use basics of STL (vector, sort() are most useful). Pointers and dynamic memory is usually a larger headache than anything in the pressures of a contest. See a further question for more. Also, classes and objects will usually not be used since your code usually depends heavily on algorithms and does not usually break down into a object-oriented way.

Q: What do I do for unknown sizes of memory? Is int x[n]; OK? Can I use pointers for this (using int *x = new int[n];)?
A: NO!!! int x[n]; is NOT OK. Arrays need to have a compile time constant size. n is not a constant at compile time. Pointers and dynamic memory is too much of a headache. Just declare an array that is of the size that is max allowed (based on question). If question says n is less than 10^6, then just declare int x[1000010];. The extra 10 is just to be safe :P. Oh, and put large arrays in global scope (declare outside the function) so that it doesn't cause stack overflow problems (can cause SIGSEGV on graders). And don't forget to initialize them using loops whenever necessary. If you really need a truly dynamic sized array, don't use pointers, use std::vector (http://www.cplusplus.com/reference/vector/vector/).

Q: Where do I learn algorithms from?
A: Best place I've found is the MIT OpenCoursware. It is a BTech level course (but usually understandable by most Indian school students too). Link: http://ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-006-introduction-to-algorithms-fall-2011/

Q: What if I prefer books instead of videos?
A: You might've heard of the book we fondly call "murder-weapon" due to it's size. CLRS (Introduction to Algorithms by Cormen, Leiserson, Rivest, Stein) is like the bible of algorithms. If you have the guts to read this huge book, by all means, please do. However, if you want something smaller and more manageable, try the book by Skiena (The Algorithm Design Manual).

Q: Which sites should I practice on?
A: SPOJ has the largest collection of problems. CodeChef has a very nice classification of Easy, Medium, Hard problems. UVa has a large number of problems suitable for ICPC training. CodeForces holds extremely regular contests (and editorials are also regular). TopCoder has regular SRMs but has a very messy arena applet that you have to get used to (but tutorials on TopCoder are great). HackerRank has a nice set of problems divided by difficulty and topic. Finally, it is up to you. I currently use UVa and CodeForces the most, but would suggest CodeChef for beginners due to its classification.

Q: What to do if I get a TLE (Time Limit Exceeded)?
A: Usually your algorithm might not be good enough. Check if the number of elemental operations (like single element access or write or operation etc) totally stays bounded by 10^6 or 10^7. If so, it should run within a second. Otherwise, you're in trouble. Even an algorithm that takes 10^8 operations would be too slow. For example, on an array of size 10^5, a sort using bubble sort (requiring quadratic time) is too slow, but merge sort (requiring N log N time) runs within a second.

Q: What to do if I get a MLE (Memory Limit Exceeded)?
A: You really need to cut down on what memory you are using. Reuse memory space if possible. Usually this verdict from a grader is very very rare. I've got it only once or twice totally in my whole experience until now, and it was fixed by rethinking the usage of memory. A million sized array if integers takes up 4 MB of space, and million 'long long's take up 8 MB space, so this is usually not a problem at all.

Q: What to do if I get a WA (Wrong Answer)?
A: Check if you are missing any \n while outputting your answer. This is a common mistake that might happen even if code and algo is correct otherwise. If that isn't the problem, try running the algo on paper and see that it matches for all cases you can throw at it. If it still works, you haven't tried some corner cases (n=0,1, or very very large). Also, are you sure you are using the right data type? int stores max of ~10^9, long long stores ~10^18. Use it instead. Remember, intermediate calculations also should fit within the limit. If all else fails, try rethinking the algorithm.

Q: I cannot figure out the algorithm. What should I do?
A: If this is in a contest, there is usually not much you can do other than just think or move onto another question. However, if this is when practicing, try working out the example on paper first to make sure you've understood the question. If nothing yet, don't give up (or check online yet). Mentally take note of the question and go on with other work. The answer will strike at the most random time (in the shower for instance). If you still get nothing (after a few days/weeks/months depending on difficulty of question), search for an editorial for the question, or ask someone who's solved it. If it involves a new concept, learn it (don't just copy code without understanding). If it did not require anything new, then re learn whatever was needed. After that, try coding it in and getting an AC. Search for related concepts and questions and solve them.

Q: I know the algo, but cannot code it in. What should I do?
A: Isn't this obvious? Relearn the topics needed.

Q: Can you see my code for problem XXX from site YYY?
A: Nope. I do not have the time to do this individually. Also, competitive code is usually hard to read and difficult to find mistakes in. I usually cannot understand my own (competitive) code after a few days (developmental code is understandable even after years however, since it is written that way). You're on your own. However, if you write the algo in simple English (or pseudo code) then it might help for you to figure out what is wrong by yourself. Also online forums exist just for this. If nothing else, if you can find me a great question, I might just write an editorial on it (see example here).

Q: My question is not in this FAQ.
A: Write it in the comments below.