report: final touch ups

This commit is contained in:
Bruno BELANYI 2021-08-08 22:39:00 +02:00
parent 97643ad609
commit 7f22b3c337
1 changed files with 102 additions and 104 deletions

206
report.md
View File

@ -1,4 +1,4 @@
# Exec Sum
# Executive Summary
When making my decision to major in Image Processing and Image Synthesis, my
main motivator was my growing curiosity for high performance programming, as I
@ -11,7 +11,7 @@ those interests.
When the opportunity arose to work for IMC, a leading firm in the world of
*market-making*, I jumped at the chance to apply to them for my internship. When
asked for the kind of work I wanted to do during that time, I highlighted my
interests in performance, which lead to the subject of writing a benchmark
interest in performance, which led to the subject of writing a benchmark
framework for their new exchange connectivity layer, currently in the process of
being created and deployed.\
This felt like the perfect subject to learn more about finance, a field I had
@ -33,7 +33,7 @@ framework that could be used to measure the performance of such a gateway. To do
so, we must be able to instrument one under various scenarios meant to mirror
real-life conditions, or exercise edge-cases.
This lead me to first get acquainted with the components that go into running
This led me to first get acquainted with the components that go into running
the gateway, and what is necessary on the client side to make use of it through
the *Execution API*, which is in the interface exposed to downstream consumers
of the gateway: the trading algorithms.\
@ -67,17 +67,17 @@ generate specific scenarios to test its behaviour. This allowed me to reuse most
of the code what I had written for the benchmark, and apply it to writing the
tests.\
The need for reliable tests meant that I had to do a lot of ground work to
ensure that they were not flaky, this is probably the part that took longest in
the process, with some deep investigations to understand some subtle bugs and
ensure that they were not flaky, this is probably the part that took the longest
in the process, with some deep investigations to understand some subtle bugs and
behaviours that were exposed by the new tests I was attempting to integrate.
Towards the end of my internship, I presented the work I did on the framework to
other developers in the execution teams. This was in part to showcase the work
being done by the *Global Execution* team, and to participate in the regular
knowledge sharing that happens at IMC.
being done by the *Global Execution* team, and also to participate in the
regular knowledge sharing that happens at IMC.
Joining a company during the COVID period of quarantines, working-from-home, and
the relatively low amount of face-to-face contact high-lighted the need for
the relatively low amount of face-to-face contact highlighted the need for
efficient ways of communicating with my colleagues. Being part of a productive,
highly driven team has been a pleasure.
@ -90,9 +90,9 @@ a software engineer, and the impact of my studies at EPITA.
\newpage
# Thanks and acknowledgements
# Acknowledgements
First off, I would like to thank Jelle Wissink, an engineer from the Global
Firstly, I would like to thank Jelle Wissink, an engineer from the Global
Execution team at IMC. As my mentor, he helped me get acquainted with the
technologies used at IMC, guided my explorations of the problems I tackled, and
was of great help to solve problems I encountered during my internship. I would
@ -114,7 +114,7 @@ him through the Tiger maintainer team.
* Élodie Puybareau and Guillaume Tochon, researchers at EPITA's LRDE, and head
teachers of the IMAGE major. They are great teachers, very involved, and always
listening to student feedbacks. They have handled the COVID crisis admirably,
taking into account the safety of their students and the work load imposed upon
taking into account the safety of their students and the workload imposed upon
them.
* The YAKA & ACU teams, a.k.a. the Teaching Assistant teams for EPITA's first
@ -129,8 +129,8 @@ and my girlfriend Sarah for her unwavering support.
My internship is about benchmarking the new service being used at IMC for
connecting to and communicating with exchanges.
IMC is a technology-driven trading company, specializing in market making on
various exchanges world-wide. Due to this position, they strive for continuous
IMC is a technology-driven trading company, specializing in market-making on
various exchanges worldwide. Due to this position, they strive for continuous
improvement by making use of technology. And in particular, they have to pay
special attention to the performance of their trading system across the whole
infrastructure.
@ -186,7 +186,7 @@ the connection between internal trading services and external exchanges' own
infrastructure and services. It is at this layer that exchange-specific
protocols are normalised into IMC's own protocol messages, and vice versa.
Here is the list of tasks that I am expected to have accomplished during this
Here is the list of tasks that I was expected to complete during this
internship:
* Become familiar with the service.
@ -194,21 +194,21 @@ internship:
* Benchmark the system under the load.
* Analyze the measurements.
This kind of project is exactly the reason that I was interested in working in
This kind of project is exactly the reason why I was interested in working in
finance and trading. It is a field that is focused on achieving the highest
performance possible, because being faster is directly tied with making more
trades and results in more profits.
Because I expressed this personal interest for working on high performance
systems and related subjects, I was given this internship project to work on.
The project was therefore perfectly aligned with my interests and skills that I
already have, or hoped to work on further.
# Context of the subject
## Company trade
IMC, as its name suggests, is a market maker. It is specialised in providing
liquidity in the market by quoting both sides of the market, and profit off the
trades they make while providing this service.
liquidity in the market by quoting both sides of the market, and profiting off
the trades they make while providing this service.
One key ingredient to this business is latency: due to the competitive nature of
the market, we must process the incoming data and execute orders fast enough not
@ -223,8 +223,8 @@ lost opportunities, therefore less profits.
It must also take on other duties, due to it being closer to the exchange than
the rest of the infrastructure. For example, a trading strategy can register
conditional orders with this service: it must monitor the price of product A and
X, if product A's cost rise over X's, then it must start selling product B at
price Y.
X, and if product A's cost rise over X's, then it must start selling product B
at price Y.
## Strategy
@ -241,8 +241,8 @@ ensure this is to measure it.
With that context, let's review my expected tasks once more, and expand on each
of them to get the roadmap:
* Become familiar with the service: before writing the code for the benchmark I
must first understand what goes into the process of a trade at IMC, what is
* Become familiar with the service: before writing the code for the benchmark, I
first needed to understand what goes into the process of a trade at IMC, what is
needed from the gateway and from the clients in order to run them and execute
orders. There is a lot of code at IMC: having different teams working at the
same time on different trading service results in a lot of churn. The global
@ -251,20 +251,20 @@ provided to the rest of the IMC workforce. The global execution gateway is one
such project, aiming to consolidate all trading strategies under one singular
method to send orders to their exchanges.
* Write a dummy load generator: we want to send orders under different
* Write a dummy load generator: we wanted to send orders under different
conditions in order to run multiple scenarios which can model varying cases of
execution. Having more data for varying corner cases can make us more confident
of the robustness and efficiency of the service. This is especially needed
becaue of the various roles that the gateway must fulfill: not only must it act
of the robustness and efficiency of the service. This was especially needed
because of the various roles that the gateway must fulfill: not only must it act
as a bridge for the communication between exchanges and traders, but also as an
order executor. All those cases must be accounted for when writing the different
scenarios.
* Benchmark the system under the load: once we can run those scenarios smoothly
we can start taking multiple measurements. The main one that IMC is interested
in is wire-to-wire latency (abbreviated W2W): the time it takes for a trade
to go from a trading strategy to an exchange. The lower this time, the more
occasions there are to make good trades.
* Benchmark the system under the load: once we could run those scenarios
smoothly we needed to start taking multiple measurements. The main one that IMC
is interested in is wire-to-wire latency (abbreviated W2W): the time it takes
for a trade to go from a trading strategy to an exchange. The lower this time
is, the more occasions there are to make good trades.
* Analyze the measurements: the global execution team has some initial
expectations of the gateway's performance. A divergence on that part could mean
@ -275,7 +275,7 @@ the timing distribution: the smaller it is the better. Having a low execution
time is necessary, however consistent timing also plays an important role to
make sure that an order will actually be executed by the exchange reliably.
## Internship positioning amongst company works
## Internship positioning among company works
My work was focused on providing a framework to instrument gateways under
different scenarios.
@ -304,7 +304,7 @@ IMC, understanding the context surrounding the team I am working in, and
learning about the different services that are currently being used in their
infrastructure. I had to write a first proof of concept to investigate what, if
any, dependencies would be needed to execute the gateway as a stand-alone system
for the benchmark. This has allowed me to get acquainted with their development
for the benchmark. This allowed me to get acquainted with their development
process.
After writing that proof of concept, we were now certain that the benchmark was
@ -312,13 +312,13 @@ a feasible project, with very few actual dependencies to be run. The low amount
of external dependencies meant fewer moving parts for the benchmarks, and a
lower amount of components to setup.\
For the ones that were needed, I had to write small modules that would model
their behaviour, and be configured as part of the framework to provide them as
their behaviour, and be configured as part of the framework, to provide them as
input to the gateway under instrumentation.
## The framework
With the exploratory phase done, writing the framework was my next task. The
first thing to do was ensuring I could run all the necessary component locally,
first thing to do was ensuring I could run all the necessary components locally,
not accounting for correct behaviour. Once I got the client communicating to the
gateway, and the gateway connected with the fake exchange, I wrote a few basic
scenarios to ensure that everything was working correctly and reliably.
@ -332,10 +332,10 @@ testing of regressions during the testing pipeline that are run before merging
patches into the code base.
Once this was done, further modifications were done to allow the benchmark to be
run using remote machines, with a lab set-up specially made to replicate the
run using remote machines, with a lab setup specially made to replicate the
production environment in a sand-boxed way. This was done in a way to
transparently allow either local or remote runs depending on what is desired,
without further modification of either the benchmark scenarios, or the framework
without further modification of either the benchmark scenarios or the framework
implementation for each exchange.
Under this setup, thanks to a component of the benchmark framework which can be
@ -366,7 +366,7 @@ Integration pipeline is valuable to avoid regressions.
* Some consumers of the *request-based API* in production are going to be in use
for long periods of time without a possibility for upgrades due to
comformability testing. To avoid any problem in production, it is of the up most
conformance testing. To avoid any problem in production, it is of the up most
importance that the *behavior* stays compatible between versions.
To that end, I endeavoured to do the necessary modifications to the current test
@ -389,8 +389,8 @@ level but up to the observable behaviour.
## Documenting my work
With that work done, I now need to ensure that the relevant knowledge is shared
across the team. This work was two-fold:
With that work done, I now needed to ensure that the relevant knowledge is
shared across the team. This work was two-fold:
* Do a presentation about the benchmark framework: because it only contains the
tools necessary as the basis for running benchmarks, other engineers will need
@ -401,9 +401,9 @@ justified some design decisions.
* How to debug problems in benchmarks and compatibility test runs: due to the
unconventional setup required to run those, investigating a problem when running
either of them necessitates specific steps and different approaches. To help
improve productivity when investigating those, I share how to replicate the test
setup in an easily replicable manner, and explain a few of the methods I have
used to debug problems I encountered during their development.
improve productivity when investigating those, I shared how to replicate the
test setup in an easily replicable manner, and explained a few of the methods I
have used to debug problems I encountered during their development.
## Gantt diagram
@ -447,9 +447,9 @@ end note
## Delivering a project from scratch to completion
During the course of my internship, I have had to deliver a product from its
first *Proof of Concept* to a usable deliverable, going through various
iterations along the way.
During the course of my internship, I had to deliver a product from its first
*Proof of Concept* to a usable deliverable, going through various iterations
along the way.
This process started with me getting familiar with the IMC code base, coming up
to speed with the tooling in use, some of the knowledge needed to work on the
@ -465,7 +465,7 @@ mentor, I could identify the needed dependencies that would to be provided to
the gateway binary in order to instrument it under different scenarios.
I worked on writing those components in a way that was usable for the benchmark,
making sure that they were working an tested along the way. One such component
making sure that they were working and tested along the way. One such component
was writing a fake version of the RDS that would be populated from the benchmark
scenario, which provided the information about financial instruments to the
gateway in order to use them in the scenario, e.g: ordering a stock.
@ -473,7 +473,7 @@ gateway in order to use them in the scenario, e.g: ordering a stock.
I went on to write a first version of the benchmark framework for a specific
gateway and a specific exchange: this served as the basis for further iteration
after receiving feedback about my design. Writing a second benchmark for a
second exchange and gateway lead to more re-design.
second exchange and gateway led to more re-design.
The basic components of the benchmark framework were useful outside of their
original intended purposes, as I could reuse them to write the compatibility
@ -488,19 +488,19 @@ using gateway binaries. I also gave a presentation at the end of my internship
to demonstrate how to run a benchmark, and explain the main components of the
framework.
I have delivered a complete, featureful product from scratch to finish, complete
with documentation and demonstration of its use. This is at the heart of our
I have delivered a complete, featureful product from start to finish, complete
with documentation and demonstration of its use. This is a central goal of our
schooling at EPITA: making us well-rounded engineers that can deliver their work
to completion.
## Acquiring new skills and knowledge
IMC is part of the financial tech sector, taking a position of market-maker on a
large amount of exchanges world-wide.
large amount of exchanges worldwide.
The financial sector, even though I was attracted to it by my previous exposure
from conference sponsors and blog posts from engineers in the sector, was still
something I was not familiar with when first joining the company.
unfamiliar to me with when first joining the company.
There is a large amount of vocabulary and knowledge specific to this industry,
not even to mention the infrastructure and tooling in use at IMC, which while
@ -511,21 +511,21 @@ Before starting my internship, I was advised to read a book about high frequency
trading, which gave me some context on how exchanges work, and a few important
words that are part of the financial vocabulary. In addition, I learned about
IMC's trading infrastructure through a number of presentations that my team lead
organised with new hires during the beginning of my internship. This not only
gave me more context about what part of the existing infrastructure was aimed to
be replaced by the new *Execution Gateway* and the *Execution API*. I also got
to learn about some of the basics of pricing theory, which underpins our whole
organised with new hires during the beginning of my internship. This gave me
more context about what part of the existing infrastructure was aimed to be
replaced by the new *Execution Gateway* and the *Execution API*. It also taught
me about some of the basics of pricing theory, which underpins our whole
strategy layer to come up with an appropriate valuation for any product we are
interested in trading.
I got to further learn about trading and option theory during a training week
organized with a dozen other summer interns: we were taught some of the
mathematics that form the basics of valuation reducing risks in trading, the
associated vocabulary, and apply them during workshop exercises in trading with
the other interns.
associated vocabulary, and applied them during workshop exercises in trading
with the other interns.
On the technical side, not only did I learn about the software stack in use at
IMC, as I worked on more and more parts of the code base I discovered new
IMC, but as I worked on more and more parts of the code base I discovered new
tooling put in place to work and debug parts of our stack that are too costly to
setup or use on any dev computer. One such solution is the *fullsim* system,
which allows us to simulate our FPGA engines in software, to allow developers to
@ -534,25 +534,24 @@ cards or know how to use them. I also introduced my colleagues to new tools that
they were unaware of, the most prominent being the one I always reach for first
when trying to debug a piece of software: `rr`, which allows one to record a
program's execution and run it under a debugger in a totally deterministic
manner: it allows replaying and rewinding execution at will, making it a great
manner -- it allows replaying and rewinding execution at will, making it a great
asset when dealing with issues that are sporadic, or require tricky timings like
networked systems.
IMC encourages knowledge sharing across all teams, it permeates the company
culture, and shows in many ways. An execution engineer is encouraged to learn
culture and shows in many ways. An execution engineer is encouraged to learn
about trading, which gives us more context when interacting with traders,
spotting mistakes in new strategies or guiding which features would make sense
spotting mistakes in new strategies, or guiding which features would make sense
to write next. Catch-up meetings are organized regularly between teams.
Presentations are given to teach people about the work that is being
accomplished to improve every part of our infrastructure, from deployment
tooling, developer productivity, to new strategies or components of our systems.
Thanks to my well-rounded education, not only do I feel comfortable being
exposed to all this information. But I have felt confident that I fit in from
the start, and could keep pace with the information that was fed to me. I am
able to pick up those new skills fast, because EPITA taught us the most
important skill of this trade: I learnt how to learn, and how to flourish while
doing so.
Thanks to my well-rounded education I felt comfortable being exposed to all this
information. But I also felt confident that I fit in from the start, and could
keep pace with the information that was fed to me. I am able to pick up those
new skills quickly, because EPITA taught us the most important skill of this
trade: I learnt how to learn, and how to flourish while doing so.
# Illustrated analysis of acquired skills
@ -574,15 +573,15 @@ We can say that IMC has embraced a more Agile way of delivering new features:
the products are continuously being worked on and improved, the work being
organized into a backlog of issues, partitioned into epics. And similarly,
the company culture embraces a few of the processes associated with Agile
programming. The one has most affected me is the daily stand up, a meeting
programming. The one that has most affected me is the daily stand up, a meeting
organized in the morning to interact with the rest of the team, summarising the
work that has been accomplished the day before, and what one wishes to work on
during the day.
During the times of remote-work because of COVID, interactions with the team at
During the times of remote work because of COVID, interactions with the team at
large feel more limited than they otherwise would be when working alongside one
another at the office. I have learned to communicate better with my colleagues:
explain what I am working on, reaching out to ask questions, and discussing
explaining what I am working on, reaching out to ask questions, and discussing
issues with them.
## Working in a large code base
@ -600,7 +599,7 @@ Due to that difference, my way of writing software and squashing bugs had to
evolve, from an approach that worked on small programs to one that is more
scalable: I could not just dive into a problem head-first, trying to understand
everything that happens down to every detail, before being able to fix the
problem. The amount of minutia is too large, it would not be productive to try
problem. The amount of minutiae is too large, it would not be productive to try
to derive an understanding of the whole application before starting to work on
it.
@ -615,7 +614,7 @@ important pieces of a puzzle.
## Debugging distributed systems
My work specifically centers around running, interacting with, instrumenting,
My work specifically centered around running, interacting with, instrumenting,
and observing production binaries for use in testing or benchmarking.
Due to this, and because nobody writes perfect code the first time, I have had
@ -633,9 +632,9 @@ understanding of the issue.
This iterative process of chipping away at the problem until the issue becomes
self-evident is inherent with working on such systems. One cannot just inspect
all the processes at once, and immediately derive what must have happened to
them. It feels more akin to detective work, with the usual suspect not being Mrs
Pink in the living room with the chandelier, but instead my own self having
forgotten to account for an edge case.
them. It feels more akin to detective work, with the usual suspect not being
Colonel Mustard in the dining room with the wrench, but instead my own self
having forgotten to account for an edge case.
# Benefits of the internship
@ -653,15 +652,15 @@ integration of running the benchmarks with the build system in use at IMC, and
its Continuous Integration pipeline, it can easily be used to monitor the
evolution of performance and watch for regressions. Further down the line, it
can be integrated with the change point detection service that is being
developed in house, to simply contact the relevant people when the system
developed in-house, to simply contact the relevant people when the system
detects that a regression has happened: the offending change can be identified
more easily that way. This is key to staying competitive, ensuring the latency
of our systems remain as low as possible and do not creep upwards.
* My work on compatibility testing, which is an important step in avoiding any
surprising behaviour or downtime in production. Due to the long turn around time
of upgrades in certain regions, and the cost of lost opportunity for any down
time, minimizing the probability of any problem that could be experienced
surprising behaviour or downtime in production. Due to the long turnaround time
of upgrades in certain regions, and the cost of lost opportunity for any
downtime, minimizing the probability of any problem that could be experienced
results directly in more profits being made.
## Furthering my learning
@ -671,21 +670,20 @@ and knowledgeable colleagues, and tinker on what constitutes the basic bricks of
IMC's production software.
Working at IMC was my first experience with such a large code base, a dizzying
amount of code. It is impossible to wrap you head around *everything* that is
amount of code. It is impossible to wrap your head around *everything* that is
happening in a given program. Up until that point I had only encountered school
projects, of relatively small size and whose behaviour could easily be
understood. Dealing with problems by trying to understand everything that is
happening in a program is a valid strategy for those. It is not, however, a
scalable way of working on software, and I needed change my way of thinking
scalable way of working on software, and I needed to change my way of thinking
about and dealing with the problems I encountered during my work. To cope with
that, I learned how to better handle problems I encountered by trying to isolate
the actual source of the problem, instead of trying to understand the whole
system around it.
that, isolate the actual source of the problem, instead of trying to understand
the whole system around it.
Interacting with the team was a great help in that endeavour. Knowing who to ask
questions to, and learning how to ask relevant questions are once again
essential in achieving productivity in those circumstances. This is doubly so in
times of remote-working, when turning around and asking your colleague a
essential in achieving productivity in those circumstances. This is doubly true
in times of remote working, when turning around and asking your colleague a
question is not so simple. I had trouble at first to actively use the internal
messaging app to ask questions, and was encouraged to ask questions liberally
instead of staying stuck on my own.
@ -695,16 +693,16 @@ instead of staying stuck on my own.
## Education and career objectives
I chose to major in Image Processing and Image Synthesis for multiple reasons,
most notably I had an interest in high performance programming, and thought that
this major would yield well to it. This proved to be true, although more so due
to applying it to the projects that we were given rather than the courses we
were taught (except for a few which specifically focused on it).
notably my interest in high performance programming, and thought that this major
would lend itself well to it. This proved to be true, although more so due to
applying it to the projects that we were given rather than the courses we were
taught (except for a few which specifically focused on it).
Through watching conference presentations, I learned about the field of finance
and thought it would provide interesting challenges that aligned with my
interests. This motivated my choice to intern at IMC, even though their business
is far removed from the core teachings of my major. This too, proved to be true,
and I'm glad to see my initial hunch panning out the way it did.
and I'm glad to see my initial hunch panning out the way it has.
## Improving the major
@ -716,9 +714,9 @@ measure and improve our code can be a necessary part of working in the industry.
The one class that stands out to me as having this issue front and center is the
GPGPU course, introducing us to massively parallel programming on a graphics
card. However, we were mostly left to our own devices to figure out effective
ways to measure, and analyse those results. Providing more guidance would be a
productive endeavor, ensuring that the students have been provided with the
correct tool set to deal with those problems.
ways to measure and analyse those results. Providing more guidance would be a
productive endeavor, ensuring that the students are provided with the correct
tool set to deal with those problems.
## Introspection
@ -726,23 +724,23 @@ Working abroad, with the additional COVID restrictions, is a harsh transition
from the routine of school. However, both the company and the team have made it
easy to adjust.
* The daily stand-up meeting, and weekly retrospective seem more important than
* The daily stand-up meeting and weekly retrospective seem more important than
ever when you can potentially not talk to your colleagues for days due to
working-from-home.
* IMC is very pro-active in organising regular events for their employees. This
* IMC is very proactive in organising regular events for their employees. This
is a great way to feel more engaged during such a period. They also organised a
week of training once the other interns had joined, which created a broader
network of relationships in a foreign city.
* My mentor encouraged me to ask as many questions as I could when I first
started my internship, and I assisted to some presentations which gave
started my internship, and I attended to some presentations which gave
additional context about the work being done by the team. This was helpful in
getting over the fact of feeling overwhelmed when first getting acquainted with
the code and technology being developed and used.
* The gradual transition to return to office, allowing me to arrange one day a
week to work next to my mentor, lead to more one-on-one interaction which feel
* The gradual transition to return to the office, allowing me to arrange one day
a week to work next to my mentor, led to more one-on-one interaction which feel
more productive than the usual textual interactions.
## Career evolution
@ -792,13 +790,13 @@ International Marketmakers Combinations (IMC) was founded in 1989 in Amsterdam,
by two traders working on the floor of the Amsterdam Equity Options Exchange. At
the time trading was executed on the exchange floor by traders manually
calculating the price to buy or sell. IMC was ahead of its time, being among the
first to understand the important role that technology and innovation will play
in the evolution of market making. This innovative culture still drives IMC 30
first to understand the important role that technology and innovation would play
in the evolution of market-making. This innovative culture still drives IMC 30
years later.
Since then, they've expanded to multiple continents, with offices operating in
Chicago, Amsterdam, and Sydney. Its key insight for trading is based on data and
algorithms, it makes use of its execution platform to provide liquidity to
Chicago, Amsterdam, and Sydney. Their key insight for trading is based on data
and algorithms, making use of their execution platform to provide liquidity to
financial markets globally.
## Results & Comments