report/report.md

495 lines
24 KiB
Markdown

# Exec Sum
# Thanks and acknowledgements
First off, I would like to thank Jelle Wissink, an engineer from the Global
Execution team at IMC. As my mentor, he helped me get acquainted with the
technologies used at IMC, guided my explorations of the problems I tackled, and
was of great help to solve problems I encountered during my internship. I would
also like to thank Erdinc Sevim, the lead of the Global Execution team, for the
instructive presentations about trading and IMC's software architecture.
I would also like to thank:
* Laurent Xu, engineer at IMC: he was the first to tell me about the company,
and referred me for an interview. He also welcomed me to Amsterdam and
introduced me to other French colleagues.
* Étienne Renault, a researcher at EPITA's LRDE: he is in charge of the Tiger
Compiler project, and taught the ALGOREP (Distributed Algorithm) class during
the course of my major. He is one of the most interesting teachers I have met,
his classes have always been a joy to attend. I'm glad to have gotten to know
him through the Tiger maintainer team.
* Élodie Puybareau and Guillaume Tochon, researchers at EPITA's LRDE, and head
teachers of the IMAGE major. They are great teachers, very involved, and always
listening to student feedbacks. They have handled the COVID crisis admirably,
taking into account the safety of their students and the work load imposed upon
them.
* The YAKA & ACU teams, a.k.a. the Teaching Assistant teams for EPITA's first
year of the engineering cycle: being a TA was a great source of learning for me.
It was one of the most fun and memorable experiences I had at the school.
Finally, I would like to thank my parents who have always been there for me,
and my girlfriend Sarah for her unwavering support.
# Introduction
My internship is about benchmarking the new service being used at IMC for
connecting to and communicating with exchanges.
IMC is a technology-driven trading company, specializing in market making on
various exchanges world-wide. Due to this position, they strive for continuous
improvement by making use of technology. And in particular, they have to pay
special attention to the performance of their trading system across the whole
infrastructure.
In the face of continuous improvement of their system, the performance aspect of
any upgrade must be kept at the forefront of the mind in order to stay
competitive, and rise to a dominant position globally.
My project fits into the migration of IMC's trading algorithms from their legacy
*driver* connecting each of them directly to the exchanges, to a new central
service being developed to translate and interface between IMC-internal
communication and exchange-facing orders, requests, and notifications.
Given the scale of this change, and how important such a piece of software is in
the trading infrastructure of the company, the performance impacts of its
introduction and further development must be measured, and its evolution
followed closely.
The first part of my internship was about writing a framework to benchmark such
a gateway with a dummy load being generated according to scenarios that can
simulate varying circumstances. From those runs, it is also in charge of
recording the performance measurements that it has gathered from the gateway,
allowing for further analysis of a single run and comparison of their evolution
as time goes on.
This initial work being finished, I integrated my framework with the tooling in
use at IMC to allow for smoother use of the runner, either locally for
development purposes or remotely for measurements. This is also used to test for
breakage in the Continuous Integration pipeline, to keep the benchmarks runnable
as changes are merged into the code base.
Once that was done, I then picked up a user story about compatibility testing:
with the way IMC deploys software, we want to ensure that both the gateway and
its clients are retro and forward compatible to avoid any surprises in
production. This was only ensured at the protocol level when I first worked on
this subject, my goal being to add tests using the actual binaries used in
production to test their behaviour across various versions, ensuring that they
behave identically.
# Subject
The first description of my internship project was given to me as:
> The project is about benchmarking a new service we're building related to
> exchange connectivity. It would involve writing a program to generate load on
> the new service, preparing a test environment and analyzing the performance
> results. Time permitting might also involve making performance improvements to
> the services.
To understand this subject, we must start with an explanation of what exchange
connectivity means at IMC: it is the layer in IMC's architecture that ensures
the connection between internal trading services and external exchanges' own
infrastructure and services. It is at this layer that exchange-specific
protocols are normalised into IMC's own protocol messages, and vice versa.
Here is the list of tasks that I am expected to have accomplished during this
internship:
* become familiar with the service,
* write a dummy load generator,
* benchmark the system under the load,
* analyze the measurements.
This kind of project is exactly the reason that I was interested in working in
finance and trading. It is a field that is focused on achieving the highest
performance possible, because being faster is directly tied with making more
trades and results in more profits.
Because I expressed this personal interest for working on high performance
systems and related subjects, I was given this internship project to work on.
# Context of the subject
## Company trade
IMC, as its name suggests, is a market maker. It is specialised in providing
liquidity in the market by quoting both sides of the market, and profit off the
trades they make while providing this service.
One key ingredient to this business is latency: due to the competitive nature of
the market, we must process the incoming data and execute orders fast enough not
to get *picked off the market* with a bad position.
## Service
The exchange connectivity layer must route orders as fast possible, to stay
competitive, reduce transaction costs, and lower latencies which could result in
lost opportunities, therefore less profits.
It must also take on other duties, due to it being closer to the exchange than
the rest of the infrastructure. For example, a trading strategy can register
conditional orders with this service: it must monitor the price of product A and
X, if product A's cost rise over X's, then it must start selling product B at
price Y.
## The competition
FIXME: what can I even say about them?
## Strategy
A new exchange connectivity service, called the Execution Gateway, is being
built at IMC, the eventual goal being to migrate all trading strategies to using
this gateway to send orders to exchanges. This will allow it to be scaled more
appropriately. However, care must be taken to maintain the current performance
during the entirety of the migration in order to stay competitive, and the only
way to ensure this is to measure it.
## Roadmap
With that context, let's review my expected tasks once more, and expand on each
of them to get the roadmap:
* Become familiar with the service: before writing the code for the benchmark I
must first understand what goes into the process of a trade at IMC, what is
needed from the gateway and from the clients in order to run them and execute
orders. There is a lot of code at IMC: having different teams working at the
same time on different trading service results in a lot of churn. The global
execution team was created to centralise the work on core services that must be
provided to the rest of the IMC workforce. The global execution gateway is one
such project, aiming to consolidate all trading strategies under one singular
method to send orders to their exchanges.
* Write a dummy load generator: we want to send orders under different
conditions in order to run multiple scenarios which can model varying cases of
execution. Having more data for varying corner cases can make us more confident
of the robustness and efficiency of the service. This is especially needed
becaue of the various roles that the gateway must fulfill: not only must it act
as a bridge for the communication between exchanges and traders, but also as an
order executor. All those cases must be accounted for when writing the different
scenarios.
* Benchmark the system under the load: once we can run those scenarios smoothly
we can start taking multiple measurements. The main one that IMC is interested
in is wall-to-wall latency (abbreviated W2W): the time it takes for a trade to
go from a trading strategy to an exchange. The lower this time, the more
occasions there are to make good trades. FIXME: probably more context in my
notes
* Analyze the measurements: the global execution team has some initial
expectations of the gateway's performance. A divergence on that part could mean
that the measurements are flawed in some way, or that the gateway is not
performing as expected. Further analysis can be done to look at the difference
between mean execution time and the 99th percentile, and analyse the tail of the
timing distribution: the smaller it is the better. Consistent timing is more
important than a lower average, because we must be absolutely confident that a
trade order is going to be executed smoothly, and introducing inconsistent
latency can result in bad trades.
## Internship positioning amongst company works
My work was focused on providing a framework to instrument gateways under
different scenarios.
Once that framework is built, to be effective it must be integrated in the
existing Continuous Integration platform used at IMC. This enables us to track
breaking changes and, eventually, be notified of performance regressions.
That last part is yet to be done, needing to be integrated with the new change
point detection tool currently being developed internally. Once that is done, we
can feed the performance results to automatically see when a regression has been
introduced into the system.
With the knowledge I gained working on this project, my next task was to add
compatibility testing to ensure backward and forward compatibility of the
clients and gateways. This meant having to run the existing tests using the
actual production binaries of the gateway, and making sure the tests keep
working across versions. This is very similar to the way the benchmarks work,
and I could reuse most of the tools developed for the framework to that end.
# Internship roadmap
## Getting acquainted with the code base
The first month was dedicated to familiarizing myself with the vocabulary at
IMC, understanding the context surrounding the team I am working in, and
learning about the different services that are currently being used in their
infrastructure. I had to write a first proof of concept to investigate what, if
any, dependencies would be needed to execute the gateway as a stand-alone system
for the benchmark. This has allowed me to get acquainted with their development
process.
After writing that proof of concept, we were now certain that the benchmark was
a feasible project, with very few actual dependencies to be run: the only one
that we needed to be concerned with it called the RDS server. The RDS server
is responsible for holding the information about all trade-able instruments at
an exchange. The gateway connects to it to receive a snapshot of the state of
those instruments, for example the mapping from IMC IDs to the ones used by the
exchange. I wrote a small module that could be used as a fake RDS server by the
benchmark framework to provide its inputs to the gateway being instrumented.
## The framework
With the exploratory phase done, writing the framework was my next task. The
first thing to do was ensuring I could run all the necessary component locally,
not accounting for correct behaviour. Once I got the client communicating to the
gateway, and the gateway connected with the fake exchange, I wrote a few basic
scenarios to ensure that everything was working correctly and reliably.
After writing the basis of the framework and ensuring it was in correct working
order, I integrated it with the build tools used by the developers and the
Continuous Integration pipeline. This allows running a single command to build
and run the benchmark on a local machine, allowing for easier iteration when
writing integrating the benchmark framework with a new exchange, and easy
testing of regressions during the testing pipeline that are run before merging
patches into the code base.
Once this was done, further modifications were done to allow the benchmark to be
run using remote machines, with a lab set-up specially made to replicate the
production environment in a sand-boxed way. This was done in a way to
transparently allow either local or remote runs depending on what is desired,
without further modification of either the benchmark scenarios, or the framework
implementation for each exchange.
Under this setup, thanks to a component of the benchmark framework which can be
used to record and dump performance data collected and emitted by the gateway,
we could take a look at the timings under different scenarios. This showed
results close to the expected values, and demonstrated that the framework was a
viable way to collect this information.
## Compatibility testing
After writing the benchmark framework and integrating it for one exchange, I
picked up another story related to testing the Execution API. Before then, all
Execution API implementations were tested using what is called the *method-based
API*, using a single process to test its behavior. This method was favored
during the transition period to Execution API, essentially being an interface
between it and the legacy *drivers* which connect directly to the exchange: it
allowed for lower transition costs while the rest of the execution API
This poses two long-term problems:
* The *request-based API*, making use of a network protocol and a separate
gateway binary, cannot be mocked/tested as easily. Having a way to test the
integration between client and server in a repeatable way that is integrated
with the Continuous Integration pipeline is valuable to avoid regressions.
* Some consumers of the *request-based API* in production are going to be in use
for long periods of time without a possibility for upgrades due to
comformability testing. To avoid any problem in production, it is of the up most
importance that the *behavior* stays compatible between versions.
To that end, I endeavoured to do the necessary modifications to the current test
framework to allow running them with the actual gateway binary. This meant the
following:
* Being able to run them without reliable timings: due to the asynchronous
nature of the Execution API, and the use of network communication between the
client, gateway, and exchange, some timing expectations from the tests needed to
be relaxed.
* Because we may be running many tests in parallel, we need to avoid any
hard-coded port value in the tests, allowing us to simply run them all in
parallel without the fear of any cross-talk or interference thanks to this
dynamic port discovery.
Once those changes were done, the tests implemented, and some bugs squashed, we
could make use of those tests to ensure compatibility not just at the protocol
level but up to the observable behaviour.
## Documenting my work
With that work done, I now need to ensure that the relevant knowledge is shared
across the team. This work was two-fold:
* Do a presentation about the benchmark framework: because it only contains the
tools necessary as the basis for running benchmarks, other engineers will need
to pick it up to write new scenarios, or implement the benchmark for new
exchanges. To that end, I FIXME
* How to debug problems in benchmarks and compatibility test runs: due to the
unconventional setup required to run those, investigating a problem when running
either of them necessitates specific steps and different approaches. To help
improve productivity when investigating those, I share how to replicate the test
setup in an easily replicable manner, and explain a few of the methods I have
used to debug problems I encountered during their development.
## Gantt diagram
FIXME
# Engineering practices
Problematic: development of a benchmark framework
# Illustrated analysis of acquired skills
# Benefits of the internship
## Contributions to the company
The work I have accomplished during my internship has resulted in tools that can
be used as the basis for extensive testing using production binaries during the
iteration of the development process.
From this work we can retain two main points for IMC:
* An extensible framework to use for benchmarking the gateways, and measure
their performance. Thanks to the ease of writing new scenarios, and the
integration of running the benchmarks with the build system in use at IMC, and
its Continuous Integration pipeline, it can easily be used to monitor the
evolution of performance and watch for regressions. Further down the line, it
can be integrated with the change point detection service that is being
developed in house, to simply contact the relevant people when the system
detects that a regression has happened: the offending change can be identified
more easily that way. This is key to staying competitive, ensuring the latency
of our systems remain as low as possible and do not creep upwards.
* My work on compatibility testing, which is an important step in avoiding any
surprising behaviour or downtime in production. Due to the long turn around time
of upgrades in certain regions, and the cost of lost opportunity for any down
time, minimizing the probability of any problem that could be experienced
results directly in more profits being made.
## Furthering my learning
During my internship, I got to work on a large code base, interact with smart
and knowledgeable colleagues, and tinker on what constitutes the basic bricks of
IMC's production software (FIXME: phrasing).
Working at IMC was my first experience with such a large code base, a dizzying
amount of code. It is impossible to wrap you head around *everything* that is
happening in a given program. Up until that point I had only encountered school
projects, of relatively small size and whose behaviour could easily be
understood. Dealing with problems by trying to understand everything that is
happening in a program is a valid strategy for those. It is not, however, a
scalable way of working on software, and I needed change my way of thinking
about and dealing with the problems I encountered during my work. To cope with
that, I learned how to better handle problems I encountered by trying to isolate
the actual source of the problem, instead of trying to understand the whole
system around it.
Interacting with the team was a great help in that endeavour. Knowing who to ask
questions to, and learning how to ask relevant questions are once again
essential in achieving productivity in those circumstances. This is doubly so in
times of remote-working, when turning around and asking your colleague a
question is not so simple. I had trouble at first to actively use the internal
messaging app to ask questions, and was encouraged to ask questions liberally
instead of staying stuck on my own.
# Conclusion
## Education and career objectives
I chose to major in Image Processing and Image Synthesis for multiple reasons,
most notably I had an interest in high performance programming, and thought that
this major would yield well to it. This proved to be true, although more so due
to applying it to the projects that we were given rather than the courses we
were taught (except for a few which specifically focused on it).
Through watching conference presentations, I learned about the field of finance
and thought it would provide interesting challenges that aligned with my
interests. This motivated my choice to intern at IMC, even though their business
is far removed from the core teachings of my major. This too, proved to be true,
and I'm glad to see my initial hunch panning out the way it did.
## Improving the major
Having more focus on measuring results and performances of our projects would be
an interesting idea, to put it into context for the major, the need for real
time image analysis and other such constraints means that having the skills to
measure and improve our code can be a necessary part of working in the industry.
The one class that stands out to me as having this issue front and center is the
GPGPU course, introducing us to massively parallel programming on a graphics
card. However, we were mostly left to our own devices to figure out effective
ways to measure, and analyse those results. Providing more guidance would be a
productive endeavor, ensuring that the students have been provided with the
correct tool set to deal with those problems.
## Introspection
Working abroad, with the additional COVID restrictions, is a harsh (FIXME: find
softer term) transition from the routine of school. However, both the company
and the team have made it easy to adjust.
* The daily stand-up meeting, and weekly retrospective seem more important than
ever when you can potentially not talk to your colleagues for days due to
working-from-home.
* IMC is very pro-active in organising regular events for their employees. This
is a great way to feel more engaged during such a period. They also organised a
week of training once the other interns had joined, which created a broader
network of relationships in a foreign city.
* My mentor encouraged me to ask as many questions as I could when I first
started my internship, and I assisted to some presentations which gave
additional context about the work being done by the team. This was helpful in
getting over the fact of feeling overwhelmed when first getting acquainted with
the code and technology being developed and used.
* The gradual transition to return to office, allowing me to arrange one day a
week to work next to my mentor, lead to more one-on-one interaction which feel
more productive than the usual textual interactions.
## Career evolution
This internship was everything I expected and more. The people are great, the
company is thriving, the work environment outstanding.
The *fintech* sector is full of interesting problems to me. I loved learning
about the basic theory of trading, what constitutes the basis for our
algorithms' decisions. I had a great deal of enjoyment working on my projects
during the internship, despite the few moments of frustration that come from
working on a distributed system.
Working at IMC, you are surrounded by smart and hard-working engineers, and
encouraged to interact with everybody to spread knowledge. Their focus on
continual improvement means that you are always learning and making yourself
better. Furthermore, they take good care of their employees, the mood is that of
a focused, casual, and playful atmosphere.
All in all, I think that IMC is great place to work at, there are few companies
like it. This will have an impact in how I rate potential future employers, as I
expect few places to be as well-rounded as IMC.
# Appendix
## Vocabulary
Bid and ask
: Respectively the price for buying and selling a stock or other financial
instrument. The closer the spread of the two prices is, the more liquidity there
is in the market for that product.
Continuous Integration
: The practice of automating the integration of code from multiple contributors
into a single software project.
Market-making
: A market-maker provides liquidity to the market by continuously quoting both
sell and trade prices on the market, hoping to make a profit on the *bid-ask
spread*.
## About IMC
International Marketmakers Combinations (IMC) was founded in 1989 in Amsterdam,
by two traders working on the floor of the Amsterdam Equity Options Exchange. At
the time trading was executed on the exchange floor by traders manually
calculating the price to buy or sell. IMC was ahead of its time, being among the
first to understand the important role that technology and innovation will play
in the evolution of market making. This innovative culture still drives IMC 30
years later.
Since then, they've expanded to multiple continents, with offices operating in
Chicago, Amsterdam, and Sydney. Its key insight for trading is based on data and
algorithms, it makes use of its execution platform to provide liquidity to
financial markets globally.
## Results & Comments