From 7f22b3c33769813ab9e3a797fb57c089b5d82e27 Mon Sep 17 00:00:00 2001 From: Bruno BELANYI Date: Sun, 8 Aug 2021 22:39:00 +0200 Subject: [PATCH] report: final touch ups --- report.md | 206 +++++++++++++++++++++++++++--------------------------- 1 file changed, 102 insertions(+), 104 deletions(-) diff --git a/report.md b/report.md index efbd356..9bc163b 100644 --- a/report.md +++ b/report.md @@ -1,4 +1,4 @@ -# Exec Sum +# Executive Summary When making my decision to major in Image Processing and Image Synthesis, my main motivator was my growing curiosity for high performance programming, as I @@ -11,7 +11,7 @@ those interests. When the opportunity arose to work for IMC, a leading firm in the world of *market-making*, I jumped at the chance to apply to them for my internship. When asked for the kind of work I wanted to do during that time, I highlighted my -interests in performance, which lead to the subject of writing a benchmark +interest in performance, which led to the subject of writing a benchmark framework for their new exchange connectivity layer, currently in the process of being created and deployed.\ This felt like the perfect subject to learn more about finance, a field I had @@ -33,7 +33,7 @@ framework that could be used to measure the performance of such a gateway. To do so, we must be able to instrument one under various scenarios meant to mirror real-life conditions, or exercise edge-cases. -This lead me to first get acquainted with the components that go into running +This led me to first get acquainted with the components that go into running the gateway, and what is necessary on the client side to make use of it through the *Execution API*, which is in the interface exposed to downstream consumers of the gateway: the trading algorithms.\ @@ -67,17 +67,17 @@ generate specific scenarios to test its behaviour. This allowed me to reuse most of the code what I had written for the benchmark, and apply it to writing the tests.\ The need for reliable tests meant that I had to do a lot of ground work to -ensure that they were not flaky, this is probably the part that took longest in -the process, with some deep investigations to understand some subtle bugs and +ensure that they were not flaky, this is probably the part that took the longest +in the process, with some deep investigations to understand some subtle bugs and behaviours that were exposed by the new tests I was attempting to integrate. Towards the end of my internship, I presented the work I did on the framework to other developers in the execution teams. This was in part to showcase the work -being done by the *Global Execution* team, and to participate in the regular -knowledge sharing that happens at IMC. +being done by the *Global Execution* team, and also to participate in the +regular knowledge sharing that happens at IMC. Joining a company during the COVID period of quarantines, working-from-home, and -the relatively low amount of face-to-face contact high-lighted the need for +the relatively low amount of face-to-face contact highlighted the need for efficient ways of communicating with my colleagues. Being part of a productive, highly driven team has been a pleasure. @@ -90,9 +90,9 @@ a software engineer, and the impact of my studies at EPITA. \newpage -# Thanks and acknowledgements +# Acknowledgements -First off, I would like to thank Jelle Wissink, an engineer from the Global +Firstly, I would like to thank Jelle Wissink, an engineer from the Global Execution team at IMC. As my mentor, he helped me get acquainted with the technologies used at IMC, guided my explorations of the problems I tackled, and was of great help to solve problems I encountered during my internship. I would @@ -114,7 +114,7 @@ him through the Tiger maintainer team. * Élodie Puybareau and Guillaume Tochon, researchers at EPITA's LRDE, and head teachers of the IMAGE major. They are great teachers, very involved, and always listening to student feedbacks. They have handled the COVID crisis admirably, -taking into account the safety of their students and the work load imposed upon +taking into account the safety of their students and the workload imposed upon them. * The YAKA & ACU teams, a.k.a. the Teaching Assistant teams for EPITA's first @@ -129,8 +129,8 @@ and my girlfriend Sarah for her unwavering support. My internship is about benchmarking the new service being used at IMC for connecting to and communicating with exchanges. -IMC is a technology-driven trading company, specializing in market making on -various exchanges world-wide. Due to this position, they strive for continuous +IMC is a technology-driven trading company, specializing in market-making on +various exchanges worldwide. Due to this position, they strive for continuous improvement by making use of technology. And in particular, they have to pay special attention to the performance of their trading system across the whole infrastructure. @@ -186,7 +186,7 @@ the connection between internal trading services and external exchanges' own infrastructure and services. It is at this layer that exchange-specific protocols are normalised into IMC's own protocol messages, and vice versa. -Here is the list of tasks that I am expected to have accomplished during this +Here is the list of tasks that I was expected to complete during this internship: * Become familiar with the service. @@ -194,21 +194,21 @@ internship: * Benchmark the system under the load. * Analyze the measurements. -This kind of project is exactly the reason that I was interested in working in +This kind of project is exactly the reason why I was interested in working in finance and trading. It is a field that is focused on achieving the highest performance possible, because being faster is directly tied with making more trades and results in more profits. -Because I expressed this personal interest for working on high performance -systems and related subjects, I was given this internship project to work on. +The project was therefore perfectly aligned with my interests and skills that I +already have, or hoped to work on further. # Context of the subject ## Company trade IMC, as its name suggests, is a market maker. It is specialised in providing -liquidity in the market by quoting both sides of the market, and profit off the -trades they make while providing this service. +liquidity in the market by quoting both sides of the market, and profiting off +the trades they make while providing this service. One key ingredient to this business is latency: due to the competitive nature of the market, we must process the incoming data and execute orders fast enough not @@ -223,8 +223,8 @@ lost opportunities, therefore less profits. It must also take on other duties, due to it being closer to the exchange than the rest of the infrastructure. For example, a trading strategy can register conditional orders with this service: it must monitor the price of product A and -X, if product A's cost rise over X's, then it must start selling product B at -price Y. +X, and if product A's cost rise over X's, then it must start selling product B +at price Y. ## Strategy @@ -241,8 +241,8 @@ ensure this is to measure it. With that context, let's review my expected tasks once more, and expand on each of them to get the roadmap: -* Become familiar with the service: before writing the code for the benchmark I -must first understand what goes into the process of a trade at IMC, what is +* Become familiar with the service: before writing the code for the benchmark, I +first needed to understand what goes into the process of a trade at IMC, what is needed from the gateway and from the clients in order to run them and execute orders. There is a lot of code at IMC: having different teams working at the same time on different trading service results in a lot of churn. The global @@ -251,20 +251,20 @@ provided to the rest of the IMC workforce. The global execution gateway is one such project, aiming to consolidate all trading strategies under one singular method to send orders to their exchanges. -* Write a dummy load generator: we want to send orders under different +* Write a dummy load generator: we wanted to send orders under different conditions in order to run multiple scenarios which can model varying cases of execution. Having more data for varying corner cases can make us more confident -of the robustness and efficiency of the service. This is especially needed -becaue of the various roles that the gateway must fulfill: not only must it act +of the robustness and efficiency of the service. This was especially needed +because of the various roles that the gateway must fulfill: not only must it act as a bridge for the communication between exchanges and traders, but also as an order executor. All those cases must be accounted for when writing the different scenarios. -* Benchmark the system under the load: once we can run those scenarios smoothly -we can start taking multiple measurements. The main one that IMC is interested -in is wire-to-wire latency (abbreviated W2W): the time it takes for a trade -to go from a trading strategy to an exchange. The lower this time, the more -occasions there are to make good trades. +* Benchmark the system under the load: once we could run those scenarios +smoothly we needed to start taking multiple measurements. The main one that IMC +is interested in is wire-to-wire latency (abbreviated W2W): the time it takes +for a trade to go from a trading strategy to an exchange. The lower this time +is, the more occasions there are to make good trades. * Analyze the measurements: the global execution team has some initial expectations of the gateway's performance. A divergence on that part could mean @@ -275,7 +275,7 @@ the timing distribution: the smaller it is the better. Having a low execution time is necessary, however consistent timing also plays an important role to make sure that an order will actually be executed by the exchange reliably. -## Internship positioning amongst company works +## Internship positioning among company works My work was focused on providing a framework to instrument gateways under different scenarios. @@ -304,7 +304,7 @@ IMC, understanding the context surrounding the team I am working in, and learning about the different services that are currently being used in their infrastructure. I had to write a first proof of concept to investigate what, if any, dependencies would be needed to execute the gateway as a stand-alone system -for the benchmark. This has allowed me to get acquainted with their development +for the benchmark. This allowed me to get acquainted with their development process. After writing that proof of concept, we were now certain that the benchmark was @@ -312,13 +312,13 @@ a feasible project, with very few actual dependencies to be run. The low amount of external dependencies meant fewer moving parts for the benchmarks, and a lower amount of components to setup.\ For the ones that were needed, I had to write small modules that would model -their behaviour, and be configured as part of the framework to provide them as +their behaviour, and be configured as part of the framework, to provide them as input to the gateway under instrumentation. ## The framework With the exploratory phase done, writing the framework was my next task. The -first thing to do was ensuring I could run all the necessary component locally, +first thing to do was ensuring I could run all the necessary components locally, not accounting for correct behaviour. Once I got the client communicating to the gateway, and the gateway connected with the fake exchange, I wrote a few basic scenarios to ensure that everything was working correctly and reliably. @@ -332,10 +332,10 @@ testing of regressions during the testing pipeline that are run before merging patches into the code base. Once this was done, further modifications were done to allow the benchmark to be -run using remote machines, with a lab set-up specially made to replicate the +run using remote machines, with a lab setup specially made to replicate the production environment in a sand-boxed way. This was done in a way to transparently allow either local or remote runs depending on what is desired, -without further modification of either the benchmark scenarios, or the framework +without further modification of either the benchmark scenarios or the framework implementation for each exchange. Under this setup, thanks to a component of the benchmark framework which can be @@ -366,7 +366,7 @@ Integration pipeline is valuable to avoid regressions. * Some consumers of the *request-based API* in production are going to be in use for long periods of time without a possibility for upgrades due to -comformability testing. To avoid any problem in production, it is of the up most +conformance testing. To avoid any problem in production, it is of the up most importance that the *behavior* stays compatible between versions. To that end, I endeavoured to do the necessary modifications to the current test @@ -389,8 +389,8 @@ level but up to the observable behaviour. ## Documenting my work -With that work done, I now need to ensure that the relevant knowledge is shared -across the team. This work was two-fold: +With that work done, I now needed to ensure that the relevant knowledge is +shared across the team. This work was two-fold: * Do a presentation about the benchmark framework: because it only contains the tools necessary as the basis for running benchmarks, other engineers will need @@ -401,9 +401,9 @@ justified some design decisions. * How to debug problems in benchmarks and compatibility test runs: due to the unconventional setup required to run those, investigating a problem when running either of them necessitates specific steps and different approaches. To help -improve productivity when investigating those, I share how to replicate the test -setup in an easily replicable manner, and explain a few of the methods I have -used to debug problems I encountered during their development. +improve productivity when investigating those, I shared how to replicate the +test setup in an easily replicable manner, and explained a few of the methods I +have used to debug problems I encountered during their development. ## Gantt diagram @@ -447,9 +447,9 @@ end note ## Delivering a project from scratch to completion -During the course of my internship, I have had to deliver a product from its -first *Proof of Concept* to a usable deliverable, going through various -iterations along the way. +During the course of my internship, I had to deliver a product from its first +*Proof of Concept* to a usable deliverable, going through various iterations +along the way. This process started with me getting familiar with the IMC code base, coming up to speed with the tooling in use, some of the knowledge needed to work on the @@ -465,7 +465,7 @@ mentor, I could identify the needed dependencies that would to be provided to the gateway binary in order to instrument it under different scenarios. I worked on writing those components in a way that was usable for the benchmark, -making sure that they were working an tested along the way. One such component +making sure that they were working and tested along the way. One such component was writing a fake version of the RDS that would be populated from the benchmark scenario, which provided the information about financial instruments to the gateway in order to use them in the scenario, e.g: ordering a stock. @@ -473,7 +473,7 @@ gateway in order to use them in the scenario, e.g: ordering a stock. I went on to write a first version of the benchmark framework for a specific gateway and a specific exchange: this served as the basis for further iteration after receiving feedback about my design. Writing a second benchmark for a -second exchange and gateway lead to more re-design. +second exchange and gateway led to more re-design. The basic components of the benchmark framework were useful outside of their original intended purposes, as I could reuse them to write the compatibility @@ -488,19 +488,19 @@ using gateway binaries. I also gave a presentation at the end of my internship to demonstrate how to run a benchmark, and explain the main components of the framework. -I have delivered a complete, featureful product from scratch to finish, complete -with documentation and demonstration of its use. This is at the heart of our +I have delivered a complete, featureful product from start to finish, complete +with documentation and demonstration of its use. This is a central goal of our schooling at EPITA: making us well-rounded engineers that can deliver their work to completion. ## Acquiring new skills and knowledge IMC is part of the financial tech sector, taking a position of market-maker on a -large amount of exchanges world-wide. +large amount of exchanges worldwide. The financial sector, even though I was attracted to it by my previous exposure from conference sponsors and blog posts from engineers in the sector, was still -something I was not familiar with when first joining the company. +unfamiliar to me with when first joining the company. There is a large amount of vocabulary and knowledge specific to this industry, not even to mention the infrastructure and tooling in use at IMC, which while @@ -511,21 +511,21 @@ Before starting my internship, I was advised to read a book about high frequency trading, which gave me some context on how exchanges work, and a few important words that are part of the financial vocabulary. In addition, I learned about IMC's trading infrastructure through a number of presentations that my team lead -organised with new hires during the beginning of my internship. This not only -gave me more context about what part of the existing infrastructure was aimed to -be replaced by the new *Execution Gateway* and the *Execution API*. I also got -to learn about some of the basics of pricing theory, which underpins our whole +organised with new hires during the beginning of my internship. This gave me +more context about what part of the existing infrastructure was aimed to be +replaced by the new *Execution Gateway* and the *Execution API*. It also taught +me about some of the basics of pricing theory, which underpins our whole strategy layer to come up with an appropriate valuation for any product we are interested in trading. I got to further learn about trading and option theory during a training week organized with a dozen other summer interns: we were taught some of the mathematics that form the basics of valuation reducing risks in trading, the -associated vocabulary, and apply them during workshop exercises in trading with -the other interns. +associated vocabulary, and applied them during workshop exercises in trading +with the other interns. On the technical side, not only did I learn about the software stack in use at -IMC, as I worked on more and more parts of the code base I discovered new +IMC, but as I worked on more and more parts of the code base I discovered new tooling put in place to work and debug parts of our stack that are too costly to setup or use on any dev computer. One such solution is the *fullsim* system, which allows us to simulate our FPGA engines in software, to allow developers to @@ -534,25 +534,24 @@ cards or know how to use them. I also introduced my colleagues to new tools that they were unaware of, the most prominent being the one I always reach for first when trying to debug a piece of software: `rr`, which allows one to record a program's execution and run it under a debugger in a totally deterministic -manner: it allows replaying and rewinding execution at will, making it a great +manner -- it allows replaying and rewinding execution at will, making it a great asset when dealing with issues that are sporadic, or require tricky timings like networked systems. IMC encourages knowledge sharing across all teams, it permeates the company -culture, and shows in many ways. An execution engineer is encouraged to learn +culture and shows in many ways. An execution engineer is encouraged to learn about trading, which gives us more context when interacting with traders, -spotting mistakes in new strategies or guiding which features would make sense +spotting mistakes in new strategies, or guiding which features would make sense to write next. Catch-up meetings are organized regularly between teams. Presentations are given to teach people about the work that is being accomplished to improve every part of our infrastructure, from deployment tooling, developer productivity, to new strategies or components of our systems. -Thanks to my well-rounded education, not only do I feel comfortable being -exposed to all this information. But I have felt confident that I fit in from -the start, and could keep pace with the information that was fed to me. I am -able to pick up those new skills fast, because EPITA taught us the most -important skill of this trade: I learnt how to learn, and how to flourish while -doing so. +Thanks to my well-rounded education I felt comfortable being exposed to all this +information. But I also felt confident that I fit in from the start, and could +keep pace with the information that was fed to me. I am able to pick up those +new skills quickly, because EPITA taught us the most important skill of this +trade: I learnt how to learn, and how to flourish while doing so. # Illustrated analysis of acquired skills @@ -574,15 +573,15 @@ We can say that IMC has embraced a more Agile way of delivering new features: the products are continuously being worked on and improved, the work being organized into a backlog of issues, partitioned into epics. And similarly, the company culture embraces a few of the processes associated with Agile -programming. The one has most affected me is the daily stand up, a meeting +programming. The one that has most affected me is the daily stand up, a meeting organized in the morning to interact with the rest of the team, summarising the work that has been accomplished the day before, and what one wishes to work on during the day. -During the times of remote-work because of COVID, interactions with the team at +During the times of remote work because of COVID, interactions with the team at large feel more limited than they otherwise would be when working alongside one another at the office. I have learned to communicate better with my colleagues: -explain what I am working on, reaching out to ask questions, and discussing +explaining what I am working on, reaching out to ask questions, and discussing issues with them. ## Working in a large code base @@ -600,7 +599,7 @@ Due to that difference, my way of writing software and squashing bugs had to evolve, from an approach that worked on small programs to one that is more scalable: I could not just dive into a problem head-first, trying to understand everything that happens down to every detail, before being able to fix the -problem. The amount of minutia is too large, it would not be productive to try +problem. The amount of minutiae is too large, it would not be productive to try to derive an understanding of the whole application before starting to work on it. @@ -615,7 +614,7 @@ important pieces of a puzzle. ## Debugging distributed systems -My work specifically centers around running, interacting with, instrumenting, +My work specifically centered around running, interacting with, instrumenting, and observing production binaries for use in testing or benchmarking. Due to this, and because nobody writes perfect code the first time, I have had @@ -633,9 +632,9 @@ understanding of the issue. This iterative process of chipping away at the problem until the issue becomes self-evident is inherent with working on such systems. One cannot just inspect all the processes at once, and immediately derive what must have happened to -them. It feels more akin to detective work, with the usual suspect not being Mrs -Pink in the living room with the chandelier, but instead my own self having -forgotten to account for an edge case. +them. It feels more akin to detective work, with the usual suspect not being +Colonel Mustard in the dining room with the wrench, but instead my own self +having forgotten to account for an edge case. # Benefits of the internship @@ -653,15 +652,15 @@ integration of running the benchmarks with the build system in use at IMC, and its Continuous Integration pipeline, it can easily be used to monitor the evolution of performance and watch for regressions. Further down the line, it can be integrated with the change point detection service that is being -developed in house, to simply contact the relevant people when the system +developed in-house, to simply contact the relevant people when the system detects that a regression has happened: the offending change can be identified more easily that way. This is key to staying competitive, ensuring the latency of our systems remain as low as possible and do not creep upwards. * My work on compatibility testing, which is an important step in avoiding any -surprising behaviour or downtime in production. Due to the long turn around time -of upgrades in certain regions, and the cost of lost opportunity for any down -time, minimizing the probability of any problem that could be experienced +surprising behaviour or downtime in production. Due to the long turnaround time +of upgrades in certain regions, and the cost of lost opportunity for any +downtime, minimizing the probability of any problem that could be experienced results directly in more profits being made. ## Furthering my learning @@ -671,21 +670,20 @@ and knowledgeable colleagues, and tinker on what constitutes the basic bricks of IMC's production software. Working at IMC was my first experience with such a large code base, a dizzying -amount of code. It is impossible to wrap you head around *everything* that is +amount of code. It is impossible to wrap your head around *everything* that is happening in a given program. Up until that point I had only encountered school projects, of relatively small size and whose behaviour could easily be understood. Dealing with problems by trying to understand everything that is happening in a program is a valid strategy for those. It is not, however, a -scalable way of working on software, and I needed change my way of thinking +scalable way of working on software, and I needed to change my way of thinking about and dealing with the problems I encountered during my work. To cope with -that, I learned how to better handle problems I encountered by trying to isolate -the actual source of the problem, instead of trying to understand the whole -system around it. +that, isolate the actual source of the problem, instead of trying to understand +the whole system around it. Interacting with the team was a great help in that endeavour. Knowing who to ask questions to, and learning how to ask relevant questions are once again -essential in achieving productivity in those circumstances. This is doubly so in -times of remote-working, when turning around and asking your colleague a +essential in achieving productivity in those circumstances. This is doubly true +in times of remote working, when turning around and asking your colleague a question is not so simple. I had trouble at first to actively use the internal messaging app to ask questions, and was encouraged to ask questions liberally instead of staying stuck on my own. @@ -695,16 +693,16 @@ instead of staying stuck on my own. ## Education and career objectives I chose to major in Image Processing and Image Synthesis for multiple reasons, -most notably I had an interest in high performance programming, and thought that -this major would yield well to it. This proved to be true, although more so due -to applying it to the projects that we were given rather than the courses we -were taught (except for a few which specifically focused on it). +notably my interest in high performance programming, and thought that this major +would lend itself well to it. This proved to be true, although more so due to +applying it to the projects that we were given rather than the courses we were +taught (except for a few which specifically focused on it). Through watching conference presentations, I learned about the field of finance and thought it would provide interesting challenges that aligned with my interests. This motivated my choice to intern at IMC, even though their business is far removed from the core teachings of my major. This too, proved to be true, -and I'm glad to see my initial hunch panning out the way it did. +and I'm glad to see my initial hunch panning out the way it has. ## Improving the major @@ -716,9 +714,9 @@ measure and improve our code can be a necessary part of working in the industry. The one class that stands out to me as having this issue front and center is the GPGPU course, introducing us to massively parallel programming on a graphics card. However, we were mostly left to our own devices to figure out effective -ways to measure, and analyse those results. Providing more guidance would be a -productive endeavor, ensuring that the students have been provided with the -correct tool set to deal with those problems. +ways to measure and analyse those results. Providing more guidance would be a +productive endeavor, ensuring that the students are provided with the correct +tool set to deal with those problems. ## Introspection @@ -726,23 +724,23 @@ Working abroad, with the additional COVID restrictions, is a harsh transition from the routine of school. However, both the company and the team have made it easy to adjust. -* The daily stand-up meeting, and weekly retrospective seem more important than +* The daily stand-up meeting and weekly retrospective seem more important than ever when you can potentially not talk to your colleagues for days due to working-from-home. -* IMC is very pro-active in organising regular events for their employees. This +* IMC is very proactive in organising regular events for their employees. This is a great way to feel more engaged during such a period. They also organised a week of training once the other interns had joined, which created a broader network of relationships in a foreign city. * My mentor encouraged me to ask as many questions as I could when I first -started my internship, and I assisted to some presentations which gave +started my internship, and I attended to some presentations which gave additional context about the work being done by the team. This was helpful in getting over the fact of feeling overwhelmed when first getting acquainted with the code and technology being developed and used. -* The gradual transition to return to office, allowing me to arrange one day a -week to work next to my mentor, lead to more one-on-one interaction which feel +* The gradual transition to return to the office, allowing me to arrange one day +a week to work next to my mentor, led to more one-on-one interaction which feel more productive than the usual textual interactions. ## Career evolution @@ -792,13 +790,13 @@ International Marketmakers Combinations (IMC) was founded in 1989 in Amsterdam, by two traders working on the floor of the Amsterdam Equity Options Exchange. At the time trading was executed on the exchange floor by traders manually calculating the price to buy or sell. IMC was ahead of its time, being among the -first to understand the important role that technology and innovation will play -in the evolution of market making. This innovative culture still drives IMC 30 +first to understand the important role that technology and innovation would play +in the evolution of market-making. This innovative culture still drives IMC 30 years later. Since then, they've expanded to multiple continents, with offices operating in -Chicago, Amsterdam, and Sydney. Its key insight for trading is based on data and -algorithms, it makes use of its execution platform to provide liquidity to +Chicago, Amsterdam, and Sydney. Their key insight for trading is based on data +and algorithms, making use of their execution platform to provide liquidity to financial markets globally. ## Results & Comments