Bruno BELANYI 6597c6bb4a report: add vocabulary section to annex

2021-07-21 18:41:37 +02:00

24 KiB

Raw Blame History

Exec Sum

Thanks and acknowledgements

First off, I would like to thank Jelle Wissink, an engineer from the Global Execution team at IMC. As my mentor, he helped me get acquainted with the technologies used at IMC, guided my explorations of the problems I tackled, and was of great help to solve problems I encountered during my internship. I would also like to thank Erdinc Sevim, the lead of the Global Execution team, for the instructive presentations about trading and IMC's software architecture.

I would also like to thank:

Laurent Xu, engineer at IMC: he was the first to tell me about the company, and referred me for an interview. He also welcomed me to Amsterdam and introduced me to other French colleagues.
Étienne Renault, a researcher at EPITA's LRDE: he is in charge of the Tiger Compiler project, and taught the ALGOREP (Distributed Algorithm) class during the course of my major. He is one of the most interesting teachers I have met, his classes have always been a joy to attend. I'm glad to have gotten to know him through the Tiger maintainer team.
Élodie Puybareau and Guillaume Tochon, researchers at EPITA's LRDE, and head teachers of the IMAGE major. They are great teachers, very involved, and always listening to student feedbacks. They have handled the COVID crisis admirably, taking into account the safety of their students and the work load imposed upon them.
The YAKA & ACU teams, a.k.a. the Teaching Assistant teams for EPITA's first year of the engineering cycle: being a TA was a great source of learning for me. It was one of the most fun and memorable experiences I had at the school.

Finally, I would like to thank my parents who have always been there for me, and my girlfriend Sarah for her unwavering support.

Introduction

My internship is about benchmarking the new service being used at IMC for connecting to and communicating with exchanges.

IMC is a technology-driven trading company, specializing in market making on various exchanges world-wide. Due to this position, they strive for continuous improvement by making use of technology. And in particular, they have to pay special attention to the performance of their trading system across the whole infrastructure.

In the face of continuous improvement of their system, the performance aspect of any upgrade must be kept at the forefront of the mind in order to stay competitive, and rise to a dominant position globally.

My project fits into the migration of IMC's trading algorithms from their legacy driver connecting each of them directly to the exchanges, to a new central service being developed to translate and interface between IMC-internal communication and exchange-facing orders, requests, and notifications.

Given the scale of this change, and how important such a piece of software is in the trading infrastructure of the company, the performance impacts of its introduction and further development must be measured, and its evolution followed closely.

The first part of my internship was about writing a framework to benchmark such a gateway with a dummy load being generated according to scenarios that can simulate varying circumstances. From those runs, it is also in charge of recording the performance measurements that it has gathered from the gateway, allowing for further analysis of a single run and comparison of their evolution as time goes on.

This initial work being finished, I integrated my framework with the tooling in use at IMC to allow for smoother use of the runner, either locally for development purposes or remotely for measurements. This is also used to test for breakage in the Continuous Integration pipeline, to keep the benchmarks runnable as changes are merged into the code base.

Once that was done, I then picked up a user story about compatibility testing: with the way IMC deploys software, we want to ensure that both the gateway and its clients are retro and forward compatible to avoid any surprises in production. This was only ensured at the protocol level when I first worked on this subject, my goal being to add tests using the actual binaries used in production to test their behaviour across various versions, ensuring that they behave identically.

Subject

The first description of my internship project was given to me as:

The project is about benchmarking a new service we're building related to exchange connectivity. It would involve writing a program to generate load on the new service, preparing a test environment and analyzing the performance results. Time permitting might also involve making performance improvements to the services.

To understand this subject, we must start with an explanation of what exchange connectivity means at IMC: it is the layer in IMC's architecture that ensures the connection between internal trading services and external exchanges' own infrastructure and services. It is at this layer that exchange-specific protocols are normalised into IMC's own protocol messages, and vice versa.

Here is the list of tasks that I am expected to have accomplished during this internship:

become familiar with the service,
write a dummy load generator,
benchmark the system under the load,
analyze the measurements.

This kind of project is exactly the reason that I was interested in working in finance and trading. It is a field that is focused on achieving the highest performance possible, because being faster is directly tied with making more trades and results in more profits.

Because I expressed this personal interest for working on high performance systems and related subjects, I was given this internship project to work on.

Context of the subject

Company trade

IMC, as its name suggests, is a market maker. It is specialised in providing liquidity in the market by quoting both sides of the market, and profit off the trades they make while providing this service.

One key ingredient to this business is latency: due to the competitive nature of the market, we must process the incoming data and execute orders fast enough not to get picked off the market with a bad position.

Service

The exchange connectivity layer must route orders as fast possible, to stay competitive, reduce transaction costs, and lower latencies which could result in lost opportunities, therefore less profits.

It must also take on other duties, due to it being closer to the exchange than the rest of the infrastructure. For example, a trading strategy can register conditional orders with this service: it must monitor the price of product A and X, if product A's cost rise over X's, then it must start selling product B at price Y.

The competition

FIXME: what can I even say about them?

Strategy

A new exchange connectivity service, called the Execution Gateway, is being built at IMC, the eventual goal being to migrate all trading strategies to using this gateway to send orders to exchanges. This will allow it to be scaled more appropriately. However, care must be taken to maintain the current performance during the entirety of the migration in order to stay competitive, and the only way to ensure this is to measure it.

Roadmap

With that context, let's review my expected tasks once more, and expand on each of them to get the roadmap:

Become familiar with the service: before writing the code for the benchmark I must first understand what goes into the process of a trade at IMC, what is needed from the gateway and from the clients in order to run them and execute orders. There is a lot of code at IMC: having different teams working at the same time on different trading service results in a lot of churn. The global execution team was created to centralise the work on core services that must be provided to the rest of the IMC workforce. The global execution gateway is one such project, aiming to consolidate all trading strategies under one singular method to send orders to their exchanges.
Write a dummy load generator: we want to send orders under different conditions in order to run multiple scenarios which can model varying cases of execution. Having more data for varying corner cases can make us more confident of the robustness and efficiency of the service. This is especially needed becaue of the various roles that the gateway must fulfill: not only must it act as a bridge for the communication between exchanges and traders, but also as an order executor. All those cases must be accounted for when writing the different scenarios.
Benchmark the system under the load: once we can run those scenarios smoothly we can start taking multiple measurements. The main one that IMC is interested in is wall-to-wall latency (abbreviated W2W): the time it takes for a trade to go from a trading strategy to an exchange. The lower this time, the more occasions there are to make good trades. FIXME: probably more context in my notes
Analyze the measurements: the global execution team has some initial expectations of the gateway's performance. A divergence on that part could mean that the measurements are flawed in some way, or that the gateway is not performing as expected. Further analysis can be done to look at the difference between mean execution time and the 99th percentile, and analyse the tail of the timing distribution: the smaller it is the better. Consistent timing is more important than a lower average, because we must be absolutely confident that a trade order is going to be executed smoothly, and introducing inconsistent latency can result in bad trades.

Internship positioning amongst company works

My work was focused on providing a framework to instrument gateways under different scenarios.

Once that framework is built, to be effective it must be integrated in the existing Continuous Integration platform used at IMC. This enables us to track breaking changes and, eventually, be notified of performance regressions. That last part is yet to be done, needing to be integrated with the new change point detection tool currently being developed internally. Once that is done, we can feed the performance results to automatically see when a regression has been introduced into the system.

With the knowledge I gained working on this project, my next task was to add compatibility testing to ensure backward and forward compatibility of the clients and gateways. This meant having to run the existing tests using the actual production binaries of the gateway, and making sure the tests keep working across versions. This is very similar to the way the benchmarks work, and I could reuse most of the tools developed for the framework to that end.

Internship roadmap

Getting acquainted with the code base

The first month was dedicated to familiarizing myself with the vocabulary at IMC, understanding the context surrounding the team I am working in, and learning about the different services that are currently being used in their infrastructure. I had to write a first proof of concept to investigate what, if any, dependencies would be needed to execute the gateway as a stand-alone system for the benchmark. This has allowed me to get acquainted with their development process.

After writing that proof of concept, we were now certain that the benchmark was a feasible project, with very few actual dependencies to be run: the only one that we needed to be concerned with it called the RDS server. The RDS server is responsible for holding the information about all trade-able instruments at an exchange. The gateway connects to it to receive a snapshot of the state of those instruments, for example the mapping from IMC IDs to the ones used by the exchange. I wrote a small module that could be used as a fake RDS server by the benchmark framework to provide its inputs to the gateway being instrumented.

The framework

With the exploratory phase done, writing the framework was my next task. The first thing to do was ensuring I could run all the necessary component locally, not accounting for correct behaviour. Once I got the client communicating to the gateway, and the gateway connected with the fake exchange, I wrote a few basic scenarios to ensure that everything was working correctly and reliably.

After writing the basis of the framework and ensuring it was in correct working order, I integrated it with the build tools used by the developers and the Continuous Integration pipeline. This allows running a single command to build and run the benchmark on a local machine, allowing for easier iteration when writing integrating the benchmark framework with a new exchange, and easy testing of regressions during the testing pipeline that are run before merging patches into the code base.

Once this was done, further modifications were done to allow the benchmark to be run using remote machines, with a lab set-up specially made to replicate the production environment in a sand-boxed way. This was done in a way to transparently allow either local or remote runs depending on what is desired, without further modification of either the benchmark scenarios, or the framework implementation for each exchange.

Under this setup, thanks to a component of the benchmark framework which can be used to record and dump performance data collected and emitted by the gateway, we could take a look at the timings under different scenarios. This showed results close to the expected values, and demonstrated that the framework was a viable way to collect this information.

Compatibility testing

After writing the benchmark framework and integrating it for one exchange, I picked up another story related to testing the Execution API. Before then, all Execution API implementations were tested using what is called the method-based API, using a single process to test its behavior. This method was favored during the transition period to Execution API, essentially being an interface between it and the legacy drivers which connect directly to the exchange: it allowed for lower transition costs while the rest of the execution API

This poses two long-term problems:

The request-based API, making use of a network protocol and a separate gateway binary, cannot be mocked/tested as easily. Having a way to test the integration between client and server in a repeatable way that is integrated with the Continuous Integration pipeline is valuable to avoid regressions.
Some consumers of the request-based API in production are going to be in use for long periods of time without a possibility for upgrades due to comformability testing. To avoid any problem in production, it is of the up most importance that the behavior stays compatible between versions.

To that end, I endeavoured to do the necessary modifications to the current test framework to allow running them with the actual gateway binary. This meant the following:

Being able to run them without reliable timings: due to the asynchronous nature of the Execution API, and the use of network communication between the client, gateway, and exchange, some timing expectations from the tests needed to be relaxed.
Because we may be running many tests in parallel, we need to avoid any hard-coded port value in the tests, allowing us to simply run them all in parallel without the fear of any cross-talk or interference thanks to this dynamic port discovery.

Once those changes were done, the tests implemented, and some bugs squashed, we could make use of those tests to ensure compatibility not just at the protocol level but up to the observable behaviour.

Documenting my work

With that work done, I now need to ensure that the relevant knowledge is shared across the team. This work was two-fold:

Do a presentation about the benchmark framework: because it only contains the tools necessary as the basis for running benchmarks, other engineers will need to pick it up to write new scenarios, or implement the benchmark for new exchanges. To that end, I FIXME
How to debug problems in benchmarks and compatibility test runs: due to the unconventional setup required to run those, investigating a problem when running either of them necessitates specific steps and different approaches. To help improve productivity when investigating those, I share how to replicate the test setup in an easily replicable manner, and explain a few of the methods I have used to debug problems I encountered during their development.

Gantt diagram

FIXME

Engineering practices

Problematic: development of a benchmark framework

Illustrated analysis of acquired skills

Benefits of the internship

Contributions to the company

The work I have accomplished during my internship has resulted in tools that can be used as the basis for extensive testing using production binaries during the iteration of the development process.

From this work we can retain two main points for IMC:

An extensible framework to use for benchmarking the gateways, and measure their performance. Thanks to the ease of writing new scenarios, and the integration of running the benchmarks with the build system in use at IMC, and its Continuous Integration pipeline, it can easily be used to monitor the evolution of performance and watch for regressions. Further down the line, it can be integrated with the change point detection service that is being developed in house, to simply contact the relevant people when the system detects that a regression has happened: the offending change can be identified more easily that way. This is key to staying competitive, ensuring the latency of our systems remain as low as possible and do not creep upwards.
My work on compatibility testing, which is an important step in avoiding any surprising behaviour or downtime in production. Due to the long turn around time of upgrades in certain regions, and the cost of lost opportunity for any down time, minimizing the probability of any problem that could be experienced results directly in more profits being made.

Furthering my learning

During my internship, I got to work on a large code base, interact with smart and knowledgeable colleagues, and tinker on what constitutes the basic bricks of IMC's production software (FIXME: phrasing).

Working at IMC was my first experience with such a large code base, a dizzying amount of code. It is impossible to wrap you head around everything that is happening in a given program. Up until that point I had only encountered school projects, of relatively small size and whose behaviour could easily be understood. Dealing with problems by trying to understand everything that is happening in a program is a valid strategy for those. It is not, however, a scalable way of working on software, and I needed change my way of thinking about and dealing with the problems I encountered during my work. To cope with that, I learned how to better handle problems I encountered by trying to isolate the actual source of the problem, instead of trying to understand the whole system around it.

Interacting with the team was a great help in that endeavour. Knowing who to ask questions to, and learning how to ask relevant questions are once again essential in achieving productivity in those circumstances. This is doubly so in times of remote-working, when turning around and asking your colleague a question is not so simple. I had trouble at first to actively use the internal messaging app to ask questions, and was encouraged to ask questions liberally instead of staying stuck on my own.

Conclusion

Education and career objectives

I chose to major in Image Processing and Image Synthesis for multiple reasons, most notably I had an interest in high performance programming, and thought that this major would yield well to it. This proved to be true, although more so due to applying it to the projects that we were given rather than the courses we were taught (except for a few which specifically focused on it).

Through watching conference presentations, I learned about the field of finance and thought it would provide interesting challenges that aligned with my interests. This motivated my choice to intern at IMC, even though their business is far removed from the core teachings of my major. This too, proved to be true, and I'm glad to see my initial hunch panning out the way it did.

Improving the major

Having more focus on measuring results and performances of our projects would be an interesting idea, to put it into context for the major, the need for real time image analysis and other such constraints means that having the skills to measure and improve our code can be a necessary part of working in the industry.

The one class that stands out to me as having this issue front and center is the GPGPU course, introducing us to massively parallel programming on a graphics card. However, we were mostly left to our own devices to figure out effective ways to measure, and analyse those results. Providing more guidance would be a productive endeavor, ensuring that the students have been provided with the correct tool set to deal with those problems.

Introspection

Working abroad, with the additional COVID restrictions, is a harsh (FIXME: find softer term) transition from the routine of school. However, both the company and the team have made it easy to adjust.

The daily stand-up meeting, and weekly retrospective seem more important than ever when you can potentially not talk to your colleagues for days due to working-from-home.
IMC is very pro-active in organising regular events for their employees. This is a great way to feel more engaged during such a period. They also organised a week of training once the other interns had joined, which created a broader network of relationships in a foreign city.
My mentor encouraged me to ask as many questions as I could when I first started my internship, and I assisted to some presentations which gave additional context about the work being done by the team. This was helpful in getting over the fact of feeling overwhelmed when first getting acquainted with the code and technology being developed and used.
The gradual transition to return to office, allowing me to arrange one day a week to work next to my mentor, lead to more one-on-one interaction which feel more productive than the usual textual interactions.

Career evolution

This internship was everything I expected and more. The people are great, the company is thriving, the work environment outstanding.

The fintech sector is full of interesting problems to me. I loved learning about the basic theory of trading, what constitutes the basis for our algorithms' decisions. I had a great deal of enjoyment working on my projects during the internship, despite the few moments of frustration that come from working on a distributed system.

Working at IMC, you are surrounded by smart and hard-working engineers, and encouraged to interact with everybody to spread knowledge. Their focus on continual improvement means that you are always learning and making yourself better. Furthermore, they take good care of their employees, the mood is that of a focused, casual, and playful atmosphere.
All in all, I think that IMC is great place to work at, there are few companies like it. This will have an impact in how I rate potential future employers, as I expect few places to be as well-rounded as IMC.

Appendix

Vocabulary

Bid and ask: Respectively the price for buying and selling a stock or other financial instrument. The closer the spread of the two prices is, the more liquidity there is in the market for that product.
Continuous Integration: The practice of automating the integration of code from multiple contributors into a single software project.
Market-making: A market-maker provides liquidity to the market by continuously quoting both sell and trade prices on the market, hoping to make a profit on the bid-ask spread.

About IMC

International Marketmakers Combinations (IMC) was founded in 1989 in Amsterdam, by two traders working on the floor of the Amsterdam Equity Options Exchange. At the time trading was executed on the exchange floor by traders manually calculating the price to buy or sell. IMC was ahead of its time, being among the first to understand the important role that technology and innovation will play in the evolution of market making. This innovative culture still drives IMC 30 years later.

Since then, they've expanded to multiple continents, with offices operating in Chicago, Amsterdam, and Sydney. Its key insight for trading is based on data and algorithms, it makes use of its execution platform to provide liquidity to financial markets globally.

24 KiB Raw Blame History