report/report.md

805 lines
41 KiB
Markdown
Raw Normal View History

2021-04-10 20:13:35 +02:00
# Exec Sum
2021-08-04 19:12:54 +02:00
When making my decision to major in Image Processing and Image Synthesis, my
main motivator was my growing curiosity for high performance programming, as I
thought that some of the subjects being taught would lend themselves well to
furthering my understanding of this domain.\
On my own, I discovered through a number of conference talks that I watched that
the field of finance provides many interesting challenges that aligned with
those interests.
When the opportunity arose to work for IMC, a leading firm in the world of
*market-making*, I jumped at the chance to apply to them for my internship. When
asked for the kind of work I wanted to do during that time, I highlighted my
interests in performance, which lead to the subject of writing a benchmark
framework for their new exchange connectivity layer, currently in the process of
being created and deployed.\
This felt like the perfect subject to learn more about finance, a field I had
been interested for some time at that point, and get exposed to people that are
knowledgeable, to cultivate my budding interest in the field.
During my internship, I was part of the *Global Execution* team at IMC, a new
team that was tasked with working on projects of relevance to multiple desks
(each of whom is dedicated to working on a specific exchange). Their first
contribution being to write the *Execution Gateway*, a piece of software that
should act as the intermediary between IMC's trading algorithm and external
exchanges. This is meant to replace the current pattern of each auto-trader
connecting directly to the exchanges to place orders, migrating instead to a
central piece of infrastructure to do the translation between internal IMC
protocols and decision, and exchange-facing requests.
As part of this migration process, my mission was to work on providing a simple
framework that could be used to measure the performance of such a gateway. To do
so, we must be able to instrument one under various scenarios meant to mirror
real-life conditions, or exercise edge-cases.
This lead me to first get acquainted with the components that go into running
the gateway, and what is necessary on the client side to make use of it through
the *Execution API*, which is in the interface exposed to downstream consumers
of the gateway: the trading algorithms.\
Similarly, I learned what the gateway itself needs to be able to communicate
with an exchange, as I would also need to control that side in the framework to
allow measurements of the time taken for an order from the client to the
exchange.\
To learn about those dependencies, I studied the existing tests for the
*Execution API*, which bundle client, gateway, and exchange in a single process
to test their correct behaviour.
Once I got more familiar with each piece of the puzzle, I got to work on writing
the framework itself. This required writing a few modules that would provide
data needed as input by the gateway as a pre-requisite to being able to run
anything. With those done, getting every component running properly was the next
step. With that done, the last part was to generate some dummy load on the
gateway, and collecting its performance measurements. Those could be analyzed
offline, in a more exploratory manner.\
With that work done, I had delivered my final product relating to my internship
subject. Taken from a first *Proof-Of-Concept* to a working end result. Along
the way I learned about the specific needs relating to the usage of those
benchmarks, and integrated them in the greater IMC ecosystem.
With the initial work on the benchmark framework done, my next area of focus was
to add compatibility testing for the gateway. The point of this work was to add,
as part of the *Continuous Integration* pipeline in use at IMC, tests that would
exercise the actual gateway binaries used in production, to test for both
forward and backward compatibility. This work fits perfectly as a continuation
of what I did to write the benchmark framework: both need to run the gateway and
generate specific scenarios to test its behaviour. This allowed me to reuse most
of the code what I had written for the benchmark, and apply it to writing the
tests.\
The need for reliable tests meant that I had to do a lot of ground work to
ensure that they were not flaky, this is probably the part that took longest in
the process, with some deep investigations to understand some subtle bugs and
behaviours that were exposed by the new tests I was attempting to integrate.
Towards the end of my internship, I presented the work I did on the framework to
other developers in the execution teams. This was in part to showcase the work
being done by the *Global Execution* team, and to participate in the regular
knowledge sharing that happens at IMC.
Joining a company during the COVID period of quarantines, working-from-home, and
the relatively low amount of face-to-face contact high-lighted the need for
efficient ways of communicating with my colleagues. Being part of a productive,
highly driven team has been a pleasure.
Overall, this internship was a great reaffirmation of my career choices. This
was my first experience with an actual project dealing with performance of
something actually used in production, and not simply a one-off project to work
on and leave aside. Being able to contribute something which will be used
long-term, and have impactful consequences, reassures me in the choice of being
a software engineer, and the impact of my studies at EPITA.
\newpage
2021-04-10 20:13:35 +02:00
# Thanks and acknowledgements
2021-07-13 20:11:20 +02:00
First off, I would like to thank Jelle Wissink, an engineer from the Global
Execution team at IMC. As my mentor, he helped me get acquainted with the
technologies used at IMC, guided my explorations of the problems I tackled, and
was of great help to solve problems I encountered during my internship. I would
also like to thank Erdinc Sevim, the lead of the Global Execution team, for the
instructive presentations about trading and IMC's software architecture.
I would also like to thank:
* Laurent Xu, engineer at IMC: he was the first to tell me about the company,
and referred me for an interview. He also welcomed me to Amsterdam and
introduced me to other French colleagues.
* Étienne Renault, a researcher at EPITA's LRDE: he is in charge of the Tiger
Compiler project, and taught the ALGOREP (Distributed Algorithm) class during
the course of my major. He is one of the most interesting teachers I have met,
his classes have always been a joy to attend. I'm glad to have gotten to know
him through the Tiger maintainer team.
* Élodie Puybareau and Guillaume Tochon, researchers at EPITA's LRDE, and head
teachers of the IMAGE major. They are great teachers, very involved, and always
listening to student feedbacks. They have handled the COVID crisis admirably,
taking into account the safety of their students and the work load imposed upon
them.
* The YAKA & ACU teams, a.k.a. the Teaching Assistant teams for EPITA's first
year of the engineering cycle: being a TA was a great source of learning for me.
It was one of the most fun and memorable experiences I had at the school.
Finally, I would like to thank my parents who have always been there for me,
and my girlfriend Sarah for her unwavering support.
2021-04-10 20:13:35 +02:00
# Introduction
2021-07-20 19:36:46 +02:00
My internship is about benchmarking the new service being used at IMC for
connecting to and communicating with exchanges.
IMC is a technology-driven trading company, specializing in market making on
various exchanges world-wide. Due to this position, they strive for continuous
improvement by making use of technology. And in particular, they have to pay
special attention to the performance of their trading system across the whole
infrastructure.
In the face of continuous improvement of their system, the performance aspect of
any upgrade must be kept at the forefront of the mind in order to stay
competitive, and rise to a dominant position globally.
2021-08-01 17:25:58 +02:00
My project fits into the migration of IMC's trading algorithms from their
individual *drivers* connecting each of them directly to the exchanges, to a new
central service being developed to translate and interface between IMC-internal
2021-07-20 19:36:46 +02:00
communication and exchange-facing orders, requests, and notifications.
Given the scale of this change, and how important such a piece of software is in
the trading infrastructure of the company, the performance impacts of its
introduction and further development must be measured, and its evolution
followed closely.
The first part of my internship was about writing a framework to benchmark such
a gateway with a dummy load being generated according to scenarios that can
simulate varying circumstances. From those runs, it is also in charge of
recording the performance measurements that it has gathered from the gateway,
allowing for further analysis of a single run and comparison of their evolution
as time goes on.
This initial work being finished, I integrated my framework with the tooling in
2021-08-01 15:14:14 +02:00
use at IMC to allow for running it more easily, either locally for development
purposes or remotely for measurements. This is also used to test for breakage in
the Continuous Integration pipeline, to keep the benchmarks runnable as changes
are merged into the code base.
2021-07-20 19:36:46 +02:00
Once that was done, I then picked up a user story about compatibility testing:
with the way IMC deploys software, we want to ensure that both the gateway and
its clients are retro and forward compatible to avoid any surprises in
production. This was only ensured at the protocol level when I first worked on
this subject, my goal being to add tests using the actual binaries used in
production to test their behaviour across various versions, ensuring that they
behave identically.
2021-04-10 20:13:35 +02:00
# Subject
2021-04-10 20:19:55 +02:00
The first description of my internship project was given to me as:
> The project is about benchmarking a new service we're building related to
> exchange connectivity. It would involve writing a program to generate load on
> the new service, preparing a test environment and analyzing the performance
> results. Time permitting might also involve making performance improvements to
> the services.
To understand this subject, we must start with an explanation of what exchange
connectivity means at IMC: it is the layer in IMC's architecture that ensures
the connection between internal trading services and external exchanges' own
infrastructure and services. It is at this layer that exchange-specific
protocols are normalised into IMC's own protocol messages, and vice versa.
Here is the list of tasks that I am expected to have accomplished during this
internship:
2021-08-01 15:14:14 +02:00
* Become familiar with the service.
* Write a dummy load generator.
* Benchmark the system under the load.
* Analyze the measurements.
2021-04-10 20:19:55 +02:00
This kind of project is exactly the reason that I was interested in working in
finance and trading. It is a field that is focused on achieving the highest
performance possible, because being faster is directly tied with making more
trades and results in more profits.
Because I expressed this personal interest for working on high performance
systems and related subjects, I was given this internship project to work on.
2021-04-10 20:13:35 +02:00
# Context of the subject
## Company trade
IMC, as its name suggests, is a market maker. It is specialised in providing
liquidity in the market by quoting both sides of the market, and profit off the
trades they make while providing this service.
One key ingredient to this business is latency: due to the competitive nature of
the market, we must process the incoming data and execute orders fast enough not
to get *picked off the market* with a bad position.
## Service
2021-04-10 20:20:57 +02:00
The exchange connectivity layer must route orders as fast possible, to stay
competitive, reduce transaction costs, and lower latencies which could result in
lost opportunities, therefore less profits.
It must also take on other duties, due to it being closer to the exchange than
the rest of the infrastructure. For example, a trading strategy can register
conditional orders with this service: it must monitor the price of product A and
X, if product A's cost rise over X's, then it must start selling product B at
price Y.
## Strategy
2021-08-01 15:14:14 +02:00
A new exchange connectivity service called the Execution Gateway, and its
accompanying *Execution API* to communicate with it, are being built at IMC. The
eventual goal being to migrate all trading strategies to using this gateway to
send orders to exchanges. This will allow it to be scaled more appropriately.
However, care must be taken to maintain the current performance during the
entirety of the migration in order to stay competitive, and the only way to
ensure this is to measure it.
2021-04-10 20:20:57 +02:00
## Roadmap
2021-04-10 20:20:57 +02:00
With that context, let's review my expected tasks once more, and expand on each
of them to get the roadmap:
* Become familiar with the service: before writing the code for the benchmark I
must first understand what goes into the process of a trade at IMC, what is
needed from the gateway and from the clients in order to run them and execute
orders. There is a lot of code at IMC: having different teams working at the
same time on different trading service results in a lot of churn. The global
execution team was created to centralise the work on core services that must be
provided to the rest of the IMC workforce. The global execution gateway is one
such project, aiming to consolidate all trading strategies under one singular
method to send orders to their exchanges.
2021-04-10 20:20:57 +02:00
* Write a dummy load generator: we want to send orders under different
conditions in order to run multiple scenarios which can model varying cases of
execution. Having more data for varying corner cases can make us more confident
of the robustness and efficiency of the service. This is especially needed
becaue of the various roles that the gateway must fulfill: not only must it act
as a bridge for the communication between exchanges and traders, but also as an
order executor. All those cases must be accounted for when writing the different
scenarios.
2021-04-10 20:20:57 +02:00
* Benchmark the system under the load: once we can run those scenarios smoothly
we can start taking multiple measurements. The main one that IMC is interested
2021-08-01 15:14:14 +02:00
in is wire-to-wire latency (abbreviated W2W): the time it takes for a trade
to go from a trading strategy to an exchange. The lower this time, the more
2021-08-01 15:27:29 +02:00
occasions there are to make good trades.
2021-04-10 20:20:57 +02:00
* Analyze the measurements: the global execution team has some initial
expectations of the gateway's performance. A divergence on that part could mean
that the measurements are flawed in some way, or that the gateway is not
performing as expected. Further analysis can be done to look at the difference
2021-08-01 15:14:14 +02:00
between median execution time and the 99th percentile, and analyse the tail of
the timing distribution: the smaller it is the better. Having a low execution
time is necessary, however consistent timing also plays an important role to
make sure that an order will actually be executed by the exchange reliably.
## Internship positioning amongst company works
My work was focused on providing a framework to instrument gateways under
different scenarios.
Once that framework is built, to be effective it must be integrated in the
existing Continuous Integration platform used at IMC. This enables us to track
breaking changes and, eventually, be notified of performance regressions.
That last part is yet to be done, needing to be integrated with the new change
point detection tool currently being developed internally. Once that is done, we
can feed the performance results to automatically see when a regression has been
introduced into the system.
With the knowledge I gained working on this project, my next task was to add
compatibility testing to ensure backward and forward compatibility of the
clients and gateways. This meant having to run the existing tests using the
actual production binaries of the gateway, and making sure the tests keep
working across versions. This is very similar to the way the benchmarks work,
and I could reuse most of the tools developed for the framework to that end.
2021-04-10 20:20:57 +02:00
2021-04-10 20:13:35 +02:00
# Internship roadmap
2021-07-19 11:03:34 +02:00
## Getting acquainted with the code base
2021-04-10 20:21:08 +02:00
The first month was dedicated to familiarizing myself with the vocabulary at
IMC, understanding the context surrounding the team I am working in, and
learning about the different services that are currently being used in their
infrastructure. I had to write a first proof of concept to investigate what, if
any, dependencies would be needed to execute the gateway as a stand-alone system
for the benchmark. This has allowed me to get acquainted with their development
process.
After writing that proof of concept, we were now certain that the benchmark was
2021-08-01 15:14:14 +02:00
a feasible project, with very few actual dependencies to be run. The low amount
of external dependencies meant fewer moving parts for the benchmarks, and a
lower amount of components to setup.\
For the ones that were needed, I had to write small modules that would model
their behaviour, and be configured as part of the framework to provide them as
input to the gateway under instrumentation.
2021-04-10 20:21:08 +02:00
2021-07-19 11:03:34 +02:00
## The framework
With the exploratory phase done, writing the framework was my next task. The
first thing to do was ensuring I could run all the necessary component locally,
not accounting for correct behaviour. Once I got the client communicating to the
gateway, and the gateway connected with the fake exchange, I wrote a few basic
scenarios to ensure that everything was working correctly and reliably.
After writing the basis of the framework and ensuring it was in correct working
order, I integrated it with the build tools used by the developers and the
Continuous Integration pipeline. This allows running a single command to build
and run the benchmark on a local machine, allowing for easier iteration when
writing integrating the benchmark framework with a new exchange, and easy
testing of regressions during the testing pipeline that are run before merging
patches into the code base.
Once this was done, further modifications were done to allow the benchmark to be
run using remote machines, with a lab set-up specially made to replicate the
production environment in a sand-boxed way. This was done in a way to
transparently allow either local or remote runs depending on what is desired,
without further modification of either the benchmark scenarios, or the framework
implementation for each exchange.
Under this setup, thanks to a component of the benchmark framework which can be
used to record and dump performance data collected and emitted by the gateway,
we could take a look at the timings under different scenarios. This showed
results close to the expected values, and demonstrated that the framework was a
viable way to collect this information.
## Compatibility testing
After writing the benchmark framework and integrating it for one exchange, I
picked up another story related to testing the Execution API. Before then, all
Execution API implementations were tested using what is called the *method-based
API*, using a single process to test its behavior. This method was favored
during the transition period to Execution API, essentially being an interface
2021-08-01 15:14:14 +02:00
between it and the *drivers* which connect directly to the exchange: it allowed
for lower transition costs while the rest of the *Execution API* was being
built.
2021-07-19 11:03:34 +02:00
This poses two long-term problems:
* The *request-based API*, making use of a network protocol and a separate
2021-08-01 15:14:14 +02:00
gateway binary, inherently relies on the interaction between a gateway and its
clients. None of the tests so far were able to check the behaviour of client and
server together using this API. Having a way to test the integration between
both components in a repeatable way that is integrated with the Continuous
Integration pipeline is valuable to avoid regressions.
2021-07-19 11:03:34 +02:00
* Some consumers of the *request-based API* in production are going to be in use
for long periods of time without a possibility for upgrades due to
comformability testing. To avoid any problem in production, it is of the up most
importance that the *behavior* stays compatible between versions.
To that end, I endeavoured to do the necessary modifications to the current test
framework to allow running them with the actual gateway binary. This meant the
following:
* Being able to run them without reliable timings: due to the asynchronous
nature of the Execution API, and the use of network communication between the
client, gateway, and exchange, some timing expectations from the tests needed to
be relaxed.
* Because we may be running many tests in parallel, we need to avoid any
hard-coded port value in the tests, allowing us to simply run them all in
parallel without the fear of any cross-talk or interference thanks to this
dynamic port discovery.
Once those changes were done, the tests implemented, and some bugs squashed, we
could make use of those tests to ensure compatibility not just at the protocol
level but up to the observable behaviour.
## Documenting my work
With that work done, I now need to ensure that the relevant knowledge is shared
across the team. This work was two-fold:
* Do a presentation about the benchmark framework: because it only contains the
tools necessary as the basis for running benchmarks, other engineers will need
to pick it up to write new scenarios, or implement the benchmark for new
2021-08-04 19:12:40 +02:00
exchanges. To that end, I presented my work, demonstrated its ease of use and
justified some design decisions.
2021-07-19 11:03:34 +02:00
* How to debug problems in benchmarks and compatibility test runs: due to the
unconventional setup required to run those, investigating a problem when running
either of them necessitates specific steps and different approaches. To help
improve productivity when investigating those, I share how to replicate the test
setup in an easily replicable manner, and explain a few of the methods I have
used to debug problems I encountered during their development.
## Gantt diagram
2021-08-05 15:11:38 +02:00
```plantuml
@startgantt
ganttscale weekly
Project starts 2021-03-01
-- Team 1 --
[Discovering the codebase] as [Discover] lasts 14 days
then [Proof Of Concept] as [POC] lasts 7 days
then [Model RDS Server] as [RDS] lasts 14 days
then [Framework v0] as [v0] lasts 14 days
then [Refactor to v1] as [v1] lasts 14 days
then [Initial results analysis] as [Analysis] lasts 7 days
then [Dumping internal measurements] as [EventAnalysis] lasts 7 days
then [Benchmark different gateway] as [NNF] lasts 14 days
[Compatibility testing] starts 2021-05-30
note bottom
Fixing bugs
Reducing flakiness
end note
[Compatibility testing] ends 2021-08-15
[Intern week] starts 2021-06-28
note bottom
Learn about option theory
end note
[Intern week] ends 2021-07-04
[Holidays] starts 2021-08-16
[Holidays] ends 2021-09-01
@endgantt
```
2021-04-10 20:21:08 +02:00
2021-04-10 20:13:35 +02:00
# Engineering practices
## Delivering a project from scratch to completion
During the course of my internship, I have had to deliver a product from its
first *Proof of Concept* to a usable deliverable, going through various
iterations along the way.
This process started with me getting familiar with the IMC code base, coming up
to speed with the tooling in use, some of the knowledge needed to work on the
existing code, etc... Once I was productive, I could start exploring the
necessary component needed to write a benchmark using the gateway binaries. This
initial POC was written by gutting some existing code in use for the Execution
API tests, splitting it into client and exchange code on one side, and gateway
code on the other.
From this initial experiment, I got more familiar with the different components
that would go into an implementation of the benchmark. With the help of my
mentor, I could identify the needed dependencies that would to be provided to
the gateway binary in order to instrument it under different scenarios.
I worked on writing those components in a way that was usable for the benchmark,
making sure that they were working an tested along the way. One such component
2021-08-01 15:27:29 +02:00
was writing a fake version of the RDS that would be populated from the benchmark
scenario, which provided the information about financial instruments to the
gateway in order to use them in the scenario, e.g: ordering a stock.
I went on to write a first version of the benchmark framework for a specific
gateway and a specific exchange: this served as the basis for further iteration
after receiving feedback about my design. Writing a second benchmark for a
second exchange and gateway lead to more re-design.
The basic components of the benchmark framework were useful outside of their
original intended purposes, as I could reuse them to write the compatibility
tests, which were similar in the way they make use of the gateway to run a
scenario and verify the correct behaviour of our system.
With all that work accomplished, I now needed to share my knowledge to make sure
that my mentor and I were not the only people who knew that this framework
existed, how to use, or how to work with it. To that end, I documented by
writing some useful tips I could give to help debug the benchmarks and tests
using gateway binaries. I also gave a presentation at the end of my internship
to demonstrate how to run a benchmark, and explain the main components of the
framework.
I have delivered a complete, featureful product from scratch to finish, complete
with documentation and demonstration of its use. This is at the heart of our
schooling at EPITA: making us well-rounded engineers that can deliver their work
2021-08-01 15:27:29 +02:00
to completion.
## Acquiring new skills and knowledge
IMC is part of the financial tech sector, taking a position of market-maker on a
large amount of exchanges world-wide.
The financial sector, even though I was attracted to it by my previous exposure
from conference sponsors and blog posts from engineers in the sector, was still
something I was not familiar with when first joining the company.
There is a large amount of vocabulary and knowledge specific to this industry,
not even to mention the infrastructure and tooling in use at IMC, which while
not specific to this industry in particular, was still foreign to me at that
point.
Before starting my internship, I was advised to read a book about high frequency
trading, which gave me some context on how exchanges work, and a few important
words that are part of the financial vocabulary. In addition, I learned about
IMC's trading infrastructure through a number of presentations that my team lead
organised with new hires during the beginning of my internship. This not only
gave me more context about what part of the existing infrastructure was aimed to
be replaced by the new *Execution Gateway* and the *Execution API*. I also got
to learn about some of the basics of pricing theory, which underpins our whole
strategy layer to come up with an appropriate valuation for any product we are
interested in trading.
I got to further learn about trading and option theory during a training week
organized with a dozen other summer interns: we were taught some of the
mathematics that form the basics of valuation reducing risks in trading, the
associated vocabulary, and apply them during workshop exercises in trading with
the other interns.
On the technical side, not only did I learn about the software stack in use at
IMC, as I worked on more and more parts of the code base I discovered new
tooling put in place to work and debug parts of our stack that are too costly to
setup or use on any dev computer. One such solution is the *fullsim* system,
which allows us to simulate our FPGA engines in software, to allow developers to
debug issues with FPGA interaction without having to fiddle with actual FPGA
cards or know how to use them. I also introduced my colleagues to new tools that
they were unaware of, the most prominent being the one I always reach for first
when trying to debug a piece of software: `rr`, which allows one to record a
program's execution and run it under a debugger in a totally deterministic
manner: it allows replaying and rewinding execution at will, making it a great
asset when dealing with issues that are sporadic, or require tricky timings like
networked systems.
IMC encourages knowledge sharing across all teams, it permeates the company
culture, and shows in many ways. An execution engineer is encouraged to learn
about trading, which gives us more context when interacting with traders,
spotting mistakes in new strategies or guiding which features would make sense
to write next. Catch-up meetings are organized regularly between teams.
Presentations are given to teach people about the work that is being
accomplished to improve every part of our infrastructure, from deployment
tooling, developer productivity, to new strategies or components of our systems.
Thanks to my well-rounded education, not only do I feel comfortable being
exposed to all this information. But I have felt confident that I fit in from
the start, and could keep pace with the information that was fed to me. I am
able to pick up those new skills fast, because EPITA taught us the most
important skill of this trade: I learnt how to learn, and how to flourish while
doing so.
2021-04-10 20:13:35 +02:00
# Illustrated analysis of acquired skills
2021-07-21 18:42:04 +02:00
## Working in a team and project management
Delivering a product is the goal of working in a business. However the process
to get from zero to final product is unique to every company.
During our project management classes, we were taught various ways of planning
a product from before the first line of code to its final delivery. The
prevailing culture in our industry in the last decade has been to adopt an agile
work environment: evolve the product and its product backlog alongside each
other, adapting to changing requirements. This is in sharp contrast to the
traditional way of planning projects, called *Waterfall*, where a specification
of behaviour is drawn upfront, and worked on for a long period of time with
little to no modification done to that project charter.
We can say that IMC has embraced a more Agile way of delivering new features:
the products are continuously being worked on and improved, the work being
organized into a backlog of issues, partitioned into epics. And similarly,
the company culture embraces a few of the processes associated with Agile
programming. The one has most affected me is the daily stand up, a meeting
organized in the morning to interact with the rest of the team, summarising the
work that has been accomplished the day before, and what one wishes to work on
during the day.
During the times of remote-work because of COVID, interactions with the team at
large feel more limited than they otherwise would be when working alongside one
another at the office. I have learned to communicate better with my colleagues:
explain what I am working on, reaching out to ask questions, and discussing
issues with them.
## Working in a large code base
IMC has a large body of code written to fulfil their business needs. It is rare
to work on a pre-existing code-base during the school curriculum. The few
projects that do provide you with a basis of code to complete them are at scales
that have nothing to do with what I had to get acclimated to at IMC.
This has multiple ramifications, all linked to the amount of knowledge embedded
in the code. It is simply impossible to understand everything in depth, one
cannot hold the entirety of the logic that has been written in their head.
Due to that difference, my way of writing software and squashing bugs had to
evolve, from an approach that worked on small programs to one that is more
scalable: I could not just dive into a problem head-first, trying to understand
everything that happens down to every detail, before being able to fix the
problem. The amount of minutia is too large, it would not be productive to try
to derive an understanding of the whole application before starting to work on
it.
Instead I had to revise my approach, getting surface level understanding of the
broad strokes, thinking more carefully about the implications of an introduced
change, coming up with a theory and confirming it.
This is only possible because of my prior experience on the large number of
projects I had to work on at EPITA, exposing me to a various subjects and
sharpening my inductive skills, building my intuition for picking out the
important pieces of a puzzle.
## Debugging distributed systems
My work specifically centers around running, interacting with, instrumenting,
and observing production binaries for use in testing or benchmarking.
Due to this, and because nobody writes perfect code the first time, I have had
to inspect and debug various issues between my code and the software I was
running under it. That kind of scenario is difficult to inspect, make sense of,
and debug. The behaviour is distributed over multiple separate processes, each
of which carries its own state.
In a way that is similar to our Distribute Algorithms final project, to tackle
those issues I had to think about the states of my processes carefully. I had to
reflect on the problem I encountered, trying to reverse-engineer what must have
gone wrong to get to that situation, and make further observations to further my
understanding of the issue.
This iterative process of chipping away at the problem until the issue becomes
self-evident is inherent with working on such systems. One cannot just inspect
all the processes at once, and immediately derive what must have happened to
them. It feels more akin to detective work, with the usual suspect not being Mrs
Pink in the living room with the chandelier, but instead my own self having
forgotten to account for an edge case.
# Benefits of the internship
## Contributions to the company
The work I have accomplished during my internship has resulted in tools that can
be used as the basis for extensive testing using production binaries during the
iteration of the development process.
From this work we can retain two main points for IMC:
* An extensible framework to use for benchmarking the gateways, and measure
their performance. Thanks to the ease of writing new scenarios, and the
integration of running the benchmarks with the build system in use at IMC, and
its Continuous Integration pipeline, it can easily be used to monitor the
evolution of performance and watch for regressions. Further down the line, it
can be integrated with the change point detection service that is being
developed in house, to simply contact the relevant people when the system
detects that a regression has happened: the offending change can be identified
more easily that way. This is key to staying competitive, ensuring the latency
of our systems remain as low as possible and do not creep upwards.
* My work on compatibility testing, which is an important step in avoiding any
surprising behaviour or downtime in production. Due to the long turn around time
of upgrades in certain regions, and the cost of lost opportunity for any down
time, minimizing the probability of any problem that could be experienced
results directly in more profits being made.
## Furthering my learning
During my internship, I got to work on a large code base, interact with smart
and knowledgeable colleagues, and tinker on what constitutes the basic bricks of
2021-08-01 15:27:29 +02:00
IMC's production software.
Working at IMC was my first experience with such a large code base, a dizzying
amount of code. It is impossible to wrap you head around *everything* that is
happening in a given program. Up until that point I had only encountered school
projects, of relatively small size and whose behaviour could easily be
understood. Dealing with problems by trying to understand everything that is
happening in a program is a valid strategy for those. It is not, however, a
scalable way of working on software, and I needed change my way of thinking
about and dealing with the problems I encountered during my work. To cope with
that, I learned how to better handle problems I encountered by trying to isolate
the actual source of the problem, instead of trying to understand the whole
system around it.
Interacting with the team was a great help in that endeavour. Knowing who to ask
questions to, and learning how to ask relevant questions are once again
essential in achieving productivity in those circumstances. This is doubly so in
times of remote-working, when turning around and asking your colleague a
question is not so simple. I had trouble at first to actively use the internal
messaging app to ask questions, and was encouraged to ask questions liberally
instead of staying stuck on my own.
2021-04-10 20:13:35 +02:00
# Conclusion
2021-07-15 19:50:19 +02:00
## Education and career objectives
I chose to major in Image Processing and Image Synthesis for multiple reasons,
most notably I had an interest in high performance programming, and thought that
this major would yield well to it. This proved to be true, although more so due
to applying it to the projects that we were given rather than the courses we
were taught (except for a few which specifically focused on it).
Through watching conference presentations, I learned about the field of finance
and thought it would provide interesting challenges that aligned with my
interests. This motivated my choice to intern at IMC, even though their business
is far removed from the core teachings of my major. This too, proved to be true,
and I'm glad to see my initial hunch panning out the way it did.
## Improving the major
Having more focus on measuring results and performances of our projects would be
an interesting idea, to put it into context for the major, the need for real
time image analysis and other such constraints means that having the skills to
measure and improve our code can be a necessary part of working in the industry.
The one class that stands out to me as having this issue front and center is the
GPGPU course, introducing us to massively parallel programming on a graphics
card. However, we were mostly left to our own devices to figure out effective
ways to measure, and analyse those results. Providing more guidance would be a
productive endeavor, ensuring that the students have been provided with the
correct tool set to deal with those problems.
## Introspection
2021-08-01 15:27:29 +02:00
Working abroad, with the additional COVID restrictions, is a harsh transition
from the routine of school. However, both the company and the team have made it
easy to adjust.
2021-07-15 19:50:19 +02:00
* The daily stand-up meeting, and weekly retrospective seem more important than
ever when you can potentially not talk to your colleagues for days due to
working-from-home.
* IMC is very pro-active in organising regular events for their employees. This
is a great way to feel more engaged during such a period. They also organised a
week of training once the other interns had joined, which created a broader
network of relationships in a foreign city.
* My mentor encouraged me to ask as many questions as I could when I first
started my internship, and I assisted to some presentations which gave
additional context about the work being done by the team. This was helpful in
getting over the fact of feeling overwhelmed when first getting acquainted with
the code and technology being developed and used.
* The gradual transition to return to office, allowing me to arrange one day a
week to work next to my mentor, lead to more one-on-one interaction which feel
more productive than the usual textual interactions.
## Career evolution
This internship was everything I expected and more. The people are great, the
company is thriving, the work environment outstanding.
The *fintech* sector is full of interesting problems to me. I loved learning
about the basic theory of trading, what constitutes the basis for our
algorithms' decisions. I had a great deal of enjoyment working on my projects
during the internship, despite the few moments of frustration that come from
working on a distributed system.
Working at IMC, you are surrounded by smart and hard-working engineers, and
encouraged to interact with everybody to spread knowledge. Their focus on
continual improvement means that you are always learning and making yourself
better. Furthermore, they take good care of their employees, the mood is that of
a focused, casual, and playful atmosphere.
All in all, I think that IMC is great place to work at, there are few companies
like it. This will have an impact in how I rate potential future employers, as I
expect few places to be as well-rounded as IMC.
2021-04-10 20:13:35 +02:00
# Appendix
## Vocabulary
Bid and ask
: Respectively the price for buying and selling a stock or other financial
instrument. The closer the spread of the two prices is, the more liquidity there
is in the market for that product.
Continuous Integration
: The practice of automating the integration of code from multiple contributors
into a single software project.
Market-making
: A market-maker provides liquidity to the market by continuously quoting both
sell and trade prices on the market, hoping to make a profit on the *bid-ask
spread*.
2021-04-10 20:13:35 +02:00
## About IMC
International Marketmakers Combinations (IMC) was founded in 1989 in Amsterdam,
by two traders working on the floor of the Amsterdam Equity Options Exchange. At
the time trading was executed on the exchange floor by traders manually
calculating the price to buy or sell. IMC was ahead of its time, being among the
first to understand the important role that technology and innovation will play
in the evolution of market making. This innovative culture still drives IMC 30
years later.
Since then, they've expanded to multiple continents, with offices operating in
Chicago, Amsterdam, and Sydney. Its key insight for trading is based on data and
algorithms, it makes use of its execution platform to provide liquidity to
financial markets globally.
2021-04-10 20:13:35 +02:00
## Results & Comments