In this discussion with Alex Tabb of the Tabb Group, Paul Haefele, Deep Value’s Managing Director of Technology, talks about how Deep Value harnesses big data technologies to optimize its high performance algorithms. He discusses how Deep Value integrated Hadoop with their fault-tolerant, low latency execution engine to empower their research department be able to easily test out new ideas, see if they work in simulation, and then straightforwardly move them to production.
Chicago, IL, March 4 – Deep Value, developer of high performance trading algorithms, posted record trading volumes in 2012 through its Broker Dealer entity Deep Value Enclave. The company also significantly increased its development and deployment of big data technology in 2012.
Deep Value achieved its single highest trading day in September 2012, processing 3.7 per cent of US-wide stock market trading volume, doubling its 2011 high of 1.8 per cent. The September high had over 300 million shares and over 12 billion US dollars traded in a single day. Trading 400 billion dollars in 2012, the firm contributed between one-half to 1 per cent of overall daily US stock market volume virtually every day.
“Our company-wide commitment to best execution is enabling us to capture increasingly larger shares of trading volumes in the US,” said Harish Devarajan, CEO of Deep Value. “Our success lies in maintaining one of the largest teams in the world dedicated solely to research and development of algorithmic trading in U.S. markets, supported by an equally ambitious big data strategy.”
The company’s big data strategy further matured in 2012, augmenting an Amazon EC2 outsourced solution with an in-house cluster comprised of 1,600 cores. Deep Value continued to invest in building out its custom simulation environment, which leverages Hadoop to run massive simulations that make it possible for the company to create fine-grained production improvements. The environment focuses heavily on empowering researchers by providing an easy-to-use development environment, accurate market models and tools to manage and understand outputs from large calculations.
“We looked to big data technologies to be able to answer what-if questions on the large data sets we deal with. Applying big data solutions on large data sets creates new big data challenges in how to manage and extract meaning from those outputs. In current work, we are using other software systems from the big data open source ecosystem to analyze automated output data and present what we believe will be insightful summaries,” said Paul Haefele, Managing Director, Technology for Deep Value.
Deep Value has offices in Chicago, Toronto, New York, London, Bangalore and Chennai.
About Deep Value
Deep Value is focused entirely on developing the world’s best trading algorithms. The firm contributes between one half to 1 per cent of overall equity trading volume to US stock markets daily, and represented 3.7 per cent of overall US stock market volume in its 2012 daily high. The company’s world-class technology solution and platform is installed on-site at client locations as well as at co-located datacenters. Clients include prominent hedge funds and other prestigious financial services powerhouses. In addition, Deep Value is the dominant Exchange-sponsored provider of algorithms to all brokers on the Floor of the NYSE. Deep Value has developed its own distributed, fault-tolerant trading platform on top of industry standard open source components. This trading platform can also be run in a cluster based simulation framework allowing Deep Value’s research organization to bring to bear very large clusters of machines to run sophisticated analysis aimed at improving performance. For more information visit: www.deepvalue.net
Facing a similar big data issue, Deep Value, a developer of algorithms, backtests its algorithms “on a multitude of orders across many months of historic data,” says CEO Harish Devarajan. To gain an edge, it also must simulate how the algos would have worked across hundreds of days of trading. The next phase is to ask “what if” questions of the data from hundreds of machines, which actually creates a new problem–“storing an ocean of data,” Devarajan says.
Are Wall Street Firms Looking to Third-Party Data Centers To Take Advantage of Efficient Cloud Computing? – Wall Street and Tech Article
Two and a half years ago, Deep Value, a Chicago-based provider of algorithms to buy- and sell-side firms that also has a broker-dealer affiliate, began using big data analytics to improve executions. Despite the depressed environment in U.S. cash equities, CEO Harish Devarajan says, the firm’s business has grown well — it executed nearly 3 percent of overall market volume on its highest-volume day this summer.
Taken together, the perception is the industry is losing control. “The complexity of some systems overcomes the best efforts of designers to keep them under control,” says Harish Devarajan, chief executive of Deep Value, a developer of trading algorithms used at the New York Stock Exchange and elsewhere. “All systems start off as things that do our bidding. But some rise in complexity to the point where we masters become the servants of the system.”
Cover Story: Last Vestige for Small Caps – Traders Magazine
Harish Devarajan, CEO of algorithmic firm Deep Value, said Reg NMS and the fragmentation it created had already delivered a huge blow to upstairs block desks before the financial crisis happened, and the thinning out of the Street after the crisis continued to push the buyside toward algos. It was those factors as much as improvements in algorithms that led to algos’ increased share in small-cap trades.
Deep Value runs Hadoop at scale on EC2, but we find that running our own cluster is significantly cheaper
We have been using Amazon’s EC2 cluster with Hadoop for a number of years to run simulations of various stock trading algorithms. We have found EC2 to quite useful in spinning up large clusters of machines on short notice and generally deploying Hadoop clusters.
The monthly bills however became more and more eye-popping ($70,000/month and growing), and some rough back of the envelope calculations led me to believe that what we were paying for storage and compute was excessive.
The long and the short of it is that Amazon’s EC2 service is 380% more expensive than running our own hardware. Of course EC2 can be provisioned on demand, but such a large multiple certainly makes having an internal cluster a key part of our ongoing Hadoop strategy. Read on for our story…
The back of the envelope
The back of the envelope calculation is this: Tiger Direct will sell you a Seagate 3 terabyte drive for $154. For the same storage on S3 for 2 years, I would pay (1,000 * 0.125 + 2,000 * 0.11) * 12 mths * 2 yrs1 = $8,232 at the standard rates. Buying our own drive was 2% of the cost of using EC2, so this certainly seemed worth investigating.
Deep Values Cluster
To do the investigation we deployed a Hadoop cluster in Telx at Clifton NJ. Telx offers competitive rates and great access to the US exchanges. The approximate power costs, hosting and bandwidth costs come to roughly $185 dollars per server per month.
We purchased 20 Linux servers running CentOS for $7,418 per server from Silicon Mechanics – these are fairly powerful commodity servers sporting 2 Intel E5-2650 processors, 64GB of RAM and 8 x 3TB hard drives. These 20 machines spanned 2 racks with space to spare. Each rack needed a switch (~$3,000.) The total cost is thus (7,418 * 20 + 3,000 * 2) = $154,360. Amortizing this over a 2 year period, works out to roughly $7,700 per month at 5% interest for the hardware.
Of course there is the hosting costs. With various minimums in place (you can’t really rent half a rack), for this analysis we will ballpark the hosting cost for these 20 servers at $5,000 per month – a little high, but usable.
The total cost is thus $12,700 per month.
EC2 Server Comparison
We also run simulations that require hundreds of machines. These simulations load up past dates trade data and then run analysis to understand how the trading strategy would have performed. The Map tasks are fairly compute intensive, with perhaps 5% of the Map task spent in I/O and 95% in calculation. Because of this we have been utilizing c1.xlarge instances (High-CPU Extra Large) on up to 900 machine clusters reading from S3 to run our analysis. These typically take an 1-2 hours to run.
Our level of parallelization is a stock-day (i.e. analysing one stock for one day.) An analysis for September 2012 has us to doing map tasks for 18,050 stock-days. Running this on a 100 machine EC2 cluster reading from S3 takes 39 minutes or 4.5 stock-days per minute per machine. Running this on the 20 Deep Value machine cluster takes 35 minutes for a speed of 26 stock-days per minute per machine.
Thus we can say that a one of our dual CPU servers is equivalent to approximately 5.75 EC2 c1.xlarge instances. Our 20 off-the-shelf machine cluster is roughly equivalent to a 115 high performance machine EC2 cluster.
Amazon EC2 Costs
This cluster has 480 TB of storage. If we set the replication rate in Hadoop to 3, this means an effective storage capacity of 160TB. On S3, these 160 TB would cost us $15,9652 per month or $383,160 over 2 years.
The calculated cost of running a 115 machine c1.xlarge cluster on EC2 would be $0.66 per hour, or $55,407 per month with a basic instance. Even with a reserved instance it is $2,000+ $0.16 per hour, or $33,5993 per month.
3.8x time more expensive
If we compared just on what we are getting in terms of compute and storage, our cluster is costing us $12,700 per month versus $48,564 (33,599+15,965) for EC2.
EC2 is thus costing us over 3.8 time more per month.
Whatever way we slice this, either by storage cost or by compute, it seems clear that using your own data center rather than EC2 makes sense for us. For one-off peaks EC2 makes sense, but given the ongoing nature of our simulated analysis, moving to our own datacenter is a very clear winner
Firm developing effective execution, process and trading strategies for a declining market environment
Chicago, IL, June 19 – Deep Value, developer of high performance trading algorithms, and its broker-dealer entity Deep Value Enclave, are reporting record trading volumes, as well as new customer wins in the first five months of 2012.
“With lower liquidity come higher trading costs,” said Harish Devarajan, CEO of Deep Value. “When you combine this fact with volatile equity markets and a challenging return environment, you have a perfect storm brewing in the cash equities world. We are seeing fund managers and broker-dealers actively re-evaluating existing algo trading relationships, and seeking the highest-value execution performance possible.”
Deep Value is using a big data strategy to analyze terabytes of market data, as well as its own historic trading data to uncover effective execution strategies for low liquidity market environments. This research involves running “what if” scenarios on large amounts of intraday trading data using hundreds of computers.
“The strategy we are pursuing today would not have been possible even five years ago,” said Devarajan. “The big data revolution has made it possible for us to pursue insights using unprecedented scale, so that, for example, we can answer questions like ‘What would have happened to our slippage and fill rates if we showed 10% more size at the inside at a certain market center if some market condition prevailed, and 10% less at other times?’ We can simulate, with that new logic, hundreds of orders trading each day for the last few months across hundreds of machines, and get reliable answers in just a half-hour. We are able to translate such core process innovations into performance gains and customer wins.”
Deep Value is tracking to significantly exceed its 2011 volumes. In the first five months of 2012, the company averaged more than 50 million shares per day across all installed sites. May 31, 2012 was its highest volume trading day, when it processed over 150 million shares, representing 1.9 per cent of total US equity trading volume. The firm has also added a major international bank to its customer base, as well as several sell-side and buy-side firms in the first five months of 2012.
In 2011, Deep Value added 16 full-time employees to its research and development teams, making it one of the largest teams in the world dedicated solely to research and development of cash equities algorithmic trading in U.S. markets. Deep Value will be exhibiting at the SIFMA show (booth 1822) in New York, June 19 – 20.
Deep Value has offices in New York, Chicago, and Toronto in North America, and in Chennai and Kolkata in India.
About Deep Value
Deep Value is focused entirely on developing the world’s best trading algorithms. Deep Value is one of only two providers of algorithms to Floor of The New York Stock Exchange. The company’s world-class algorithms and platform are installed both onsite at client locations as well as at its own datacenters. Clients include the New York Stock Exchange, prominent hedge funds and a number of other prestigious financial services powerhouses. The firm has notable analytical and computational research capabilities, running complex market and strategy simulations on terabytes of data using hundreds of machines as part of its research process. Deep Value also has strong technical capabilities. It has developed its own sophisticated trading platform on top of industry-standard open-source components. This system is fully distributed and fault-tolerant, providing graceful degradation under load and with sophisticated work scheduling frameworks. For more information visit: www.deepvalue.net.
Deep Value from its start began as a distributed organization. From the very beginning the 3 principals where in 3 separate cities – New York, Toronto and Chennai. Today, the primary development, research and operations center are in Chennai with Toronto and Chicago being management, sales and support centers.
To make this work, process and distributed tools are buried deep in our DNA, rather than tacked on after we’ve reach some size were process is needed to move forward. In addition, we needed to focus on our core competencies so we looked for tools to enhance our productivity, rather than trying and build them ourselves. As such we were (and continue to be) on the lookout for great, inexpensive productivity tools.
As such I wanted to share some of the open source and inexpensive tools we’ve utilized to help organize ourselves at Deep Value.
1. Atlassian JIRA and Greenhopper (http://www.atlassian.com)
Getting work done and tracking what you’re doing is key. We used Bugzilla for some time, but given the critical nature of work tracking, we decide that a more robust solution was required. We experimented with several solutions (MS Project (waterfall – arrggg) and TeamWork (www.twproject.com) ) but in the end an agile development methodology is really the best way to build systems in a complex, fluid environment. We wanted to have a centralized system for issues and user stories – JIRA with GreenHopper works well for this.
2. Codesion (http://codesion.com/)
We needed source control, and having someone manage this securely and safely for a minimal fee, this made sense. As we’ve grown we are looking at doing this on-site, but the cost is low and the service level high.
3. Google Apps (http://www.google.com/apps/intl/en/business/)
We have a sizable systems team, but we want them to focus on managing our various data centers, not setting up calendars. After trying a few open source solutions for calendars and email, we went with Google. We use it for email, calendars and shared documents. The shared documents have been especially useful in collaborating with clients – providing us with a centralized “whiteboard” that multiple parties can view. We are now using Webcams and Google Hangouts to build a more cohesive team feeling.
4. Aretta (http://www.aretta.com) now CBeyond
After using Skype for a while, we went with a more full featured VOIP provider. We tried several VOIP providers multiple times, and Aretta won out each time. They had some minor reliability issues when they migrated data centers a year or so ago (hence searching twice) but they are the best of the inexpensive variety.
5. Workforce Growth (http://www.workforcegrowth.com)
Doing reviews is essential. Tracking all the questions and doing 360s without a tool is not for the feint hearted. WorkForceGrowth is a great tool and improved our review processes, although nothing replaces being a good listener.
6. Asana (http://asana.com/)
For managing multi-team projects with many small tasks and co-ordination, we found JIRA to be too heavy weight. Google documents are too unstructured and don’t prompt action. Asana fits well with client integrations and cross-team project management. A great tool that we’ve recently started using more and more.
7. FollowUpThen (www.followupthen.com)
One of our issues was the “dropped email thread” problem leading to dropped work. If you have an email that you need to ensure you follow-up to completion, adding a cc: to followupthen.com (e.g. email@example.com or firstname.lastname@example.org) will get followupthen to remind you that you did not get a reply from the person whom you sent the email to. This allows you to send-and-forget emails as followupthen will prompt you if you received no reply. No more dropped email threads.
8 RecruiterBox (www.recruiterbox.com)
Managing your recruitment pipeline and job postings is a real problem. Recruiterbox has helped us track candidates as they move through our recruitment pipeline. This ensures we have suitable statistics to measure recruitment performance as well as have a centralized repository for all the information relating to a candidate. Recruiterbox can also push job postings out to our website and other social media (linkedin, facebook.)