Achieving and maintaining good security posture requires good situational awareness. Achieving “good” situational awareness requires capabilities that accurately reflect network status in real-time and are simple to use and access. Capabilities that are economical on top of these needs are also a must as every CISO tries to stretch a tight budget. So when my long-time friend, Christian Shrauder, called me to discuss his newest blog where he describes a technology that has all of these capabilities and more, well, I found myself listening quite closely. Yellowbrick Data is Christian’s newest venture where he manages its federal business. Capabilities that truly highlight speeds and feeds have always garnered the center court of attention in many areas of the government industry, and Yellowbrick seems to be no exception. However, Yellowbrick’s data warehousing solution is also a standout from an ease of use perspective. I am not going to steal Christian’s thunder any more so read the interview to learn more about how Yellowbrick Data works and its mind-bending performance features. You can also click on the ad to the right —>>> to read Christian’s blog and to find out more about Yellowbrick Data’s solutions.
Spotlight on Mr. Christian Shrauder
» Title: Christian Shrauder, Director of Federal Systems, Yellowbrick Data
» Website: https://yellowbrick.com/
» LinkedIn: https://www.linkedin.com/in/christian-shrauder-71759a14
Read his bio below.
Chris Daly, Active Cyber™: Can you provide an overview of Yellowbrick, including the origin of the company?
Mr. Christian Shrauder, Director of Federal Systems, Yellowbrick Data: Yellowbrick Data was founded in 2014 by experts in database and flash memory technologies to simplify data warehousing. The founders aimed to solve the challenges of high availability, running complex mixed workloads, support for ad hoc SQL, computing correct answers on any schema, delivering massive scalability, and supporting large numbers of concurrent users.
Yellowbrick employs experts in flash memory, including former Fusion-io CTO Neil Carson (now CEO) and former Fusion-io CRO Jim Dawson (now CRO). We also employ experts in flash-centric databases and data warehousing, including former Aerospike CTO, Brian Bulkowski (now CTO) and staff from Microsoft, IBM, Google and elsewhere.
The Yellowbrick Data Warehouse easily handles the requirements for a modern data warehouses, is quick to deploy, easy to expand, and simple to manage. It also fundamentally changes the economics of enterprise data warehousing to deliver the lowest acquisition and operating costs. Yellowbrick can be deployed in customer data centers or the cloud, offering complete customer choice.
Active Cyber™: What is a profile of a typical customer and what are some examples of your customer base?
Mr. Shrauder: Yellowbrick serves enterprises with a critical need for large scale data warehousing. Our customers span all key vertical markets including telecommunications, finance, healthcare, insurance, retail, transportation, hospitality and more. These key customers share a need to handle large amounts of data with fast and economical processing. A few specific examples include telecommunications analytics with TEOCO, ecommerce analytics at Overstock.com, healthcare analytics with Allscripts, and hospitality analytics with Melco Resorts.
Active Cyber™: In your blog you discuss Yellowbrick’s ability to handle Netflow. What are some of the challenges in processing Netflow data efficiently and in cyber relevant time? How is Yellowbrick uniquely capable in addressing these challenges?
Mr. Shrauder: In the blog post I describe how Netflow is a challenging dataset due to its performance demands and the large volumes of data enterprises have to store for historical analysis. Netflow is like network metadata and tracks each transaction with time stamps, source and destination IPs, duration, bytes transmitted, and so forth. As you can imagine, with every network-enabled device creating potentially hundreds of connections a day, the larger the enterprise, the larger the problem. Because Netflow data is highly structured, relational databases are ideal for delivering fast analytics and short query response times on this data. However, they typically have to keep the entire active dataset in memory to perform well. I’ve run into customers that get 10s of billions of records a day, which quickly reaches the multi-terabyte capacity range. It isn’t economical to keep this amount of data in memory, so they have to resort to other means to achieve acceptable performance levels.
Full-text indexing helps with this problem, but if you want to query data that is anywhere close to real time, your bottleneck becomes your ingest speed. In addition, indexes can often be larger than the actual data, increasing capacity requirements by up to 10x, which requires huge multi-rack clusters and millions of dollars in acquisition and support costs. Any change to the configuration or data require time-consuming re-indexing — and a corresponding lag in real-time visibility.
Yellowbrick gives Netflow environments a high-performance relational database that scales to petabyte-sized datasets. Its unique NVMe flash backend delivers data at in-memory performance direct from Flash memory, so it doesn’t fall down when the dataset size is 10 or even 100 times larger than DRAM capacity. Also, as a Massively Parallel Processing (MPP) solution, Yellowbrick can scale out compute or capacity as demands grow — and it does so in a very small footprint, which reduces costs.
Active Cyber™: What are some of the performance gains that make Yellowbrick the ideal platform for processing Netflow data for maintaining network visibility and cyber threat intelligence management?
Mr. Shrauder: Yellowbrick routinely bests other solutions by 100x, on a fraction of the hardware.
First, Yellowbrick ingests data very quickly. In my blog post, I show results from tests, where Yellowbrick ingests a month’s worth of data (308GB and 3.4 Billion rows) in five minutes. Ingest speeds were at a rate of 10M+ records every second.
Second it scales massively. My tests also included measuring query response times as dataset size increased. Query latency stayed relatively flat, as the number of rows increased from 500 million to about 14 billion. At 500 million rows, queries responded in .006 seconds. As data reaches ~14-billion rows, query response time grew modestly to .0122 seconds. Toward the end of the test, the query was returning hundreds of records out of ~14 billion in about 1/8th of a second.
The impact of this on customers is immense. Rapid data ingest means they can detect threats faster. They can also detect threats with greater accuracy, spotting patterns with queries over more historical data.
Active Cyber™: What are some of the hardware and software optimizations that enable the impressive performance gains provided by Yellowbrick Data?
Mr. Shrauder: Key innovations on the software side incluse how we handle the flash. For example, our unique database software can deliver queries directly out of flash with what we call “native flash queries.” This sets Yellowbrick apart because we don’t have to rely on DRAM for performance. We have also delivered innovations optimizing the database for fast ingest, while remaining relational to support fast, complex queries on very large datasets.
From a hardware perspective, much of our innovation includes our ability to integrate high-core count CPUs, memory, NVMe flash and high speed networking. Together this provides an efficient solution for customers to get the most value out of their data.
Active Cyber™: High performance data warehouse systems often come with added space, weight, power and complexity. How does Yellowbrick compare with respect to these characteristics to other data warehouse providers? What levels / types of expertise are needed to run and maintain it?
Mr. Shrauder: Yellowbrick shines when it comes to performance density. We deliver significantly more performance in as little as 1/20th of the space, and correspondingly less power and cooling.
Yellowbrick is also designed to be simple and easy to use. We automate or eliminate the need for mundane tasks like maintaining statistics and indexes, and running vacuums. We are also compatible with PostgreSQL and utilize its drivers. If someone knows PostgreSQL, they can pick up Yellowbrick easily in a couple of weeks. While competing solutions require a team to maintain the system, Yellowbrick actually frees DBAs from maintenance so they can focus on bringing in new data types and new capabilities to the system.
Active Cyber™: Many data warehouses have proprietary mechanisms that are often unfamiliar to customers and often lock them in. What is Yellowbrick’s approach to open standards, as well as interoperability?
Mr. Shrauder: Yellowbrick fits easily within existing environments by maintaining standard interfaces such as ANSI SQL, and specifically the PostgreSQL dialect. Customers can import and export data between Yellowbrick and any other PostgreSQL-based systems over standard connectors with a basic export/import of the data. Ultimately the data is relational and any system that can import a CSV can read the data and host it.
Active Cyber™: Data volume and velocity are two characteristics that are table stakes and need to be addressed by today’s and tomorrow’s data warehouses. What about data variety? How well does Yellowbrick handle mixed workloads and different data types?
Mr. Shrauder: Yellowbrick has a relational database, so we handle all the data types you would expect for a traditional RDBMS. In addition, we have released connectors such as ybrelay that can read data directly from open source solutions like Apache Spark and pull from a wide variety of sources including AVRO, JSON, and Apache Parquet.
Yellowbrick truly shines when it comes to handling mixed workloads. Many competing solutions cannot handle data ingest, reporting. and ad hoc workloads at the same time. Yellowbrick includes a hybrid row/column store for fast ingest and queries, rich workload management features that protect critical applications from noisy neighbors and poorly written queries, and an NVMe-based back end that enables users to execute ad hoc queries to atomic data without the need for rollups and cubes. Prior to Yellowbrick, enterprises would have had to deploy multiple solutions to achieve all this functionality.
Active Cyber™: How is the appliance deployed in the data center? In the cloud? What types of deployment services are available? What are some of the major tasks and dependencies for a typical install?
Mr. Shrauder: For on-premises deployment, the job is simple: provide a rack, power, and networking and you are up and running. A Yellowbrick deployment can take 30 minutes once basic prep work, like running network cables, has been done.
Yellowbrick also offers cloud options for customers with more details coming later this year.
Active Cyber™: What is your market strategy and how have you adapted your technology and market strategy since you emerged from stealth mode? What is your key differentiator and approach for the federal market?
Mr. Shrauder: Yellowbrick focuses on working with the largest enterprises and agencies to deliver game-changing analytic performance at the most efficient costs. We appreciate that the data warehousing marketing is not new, but in many respects it has been neglected over the years, leaving customers in need of performance solutions without many choices.
Now that we have emerged, we remain dedicated to our mission of helping customers solve critical analytic challenges. Several of the customer videos on our website point to how we have done this to date by delivering a highly-performant solution at a fraction of the cost of competing solutions.
Our biggest differentiator in the Federal space is our simplicity. In fact, a key objective in designing a system with such mind-blowing performance was simply to make performance something that organizations no longer need to think about. Yellowbrick can be up and running in 30 minutes. In contrast, solutions from legacy vendors can take months and some open source solutions turn into science projects that take years to produce a return.
Our biggest successes in the Federal space have been finding innovative customers that understand that the most advanced technologies rarely come from the most traditional vendors. We have also been fortunate in working with the right partners — ones who really understand our value in the Federal space. Importantly, Yellowbrick includes features that Federal customers commonly require, such as AES 256-bit encryption-at-rest and LDAP/AD integration.
Thank you Christian for this in depth view of Yellowbrick’s truly remarkable data warehousing capabilities. As the need for greater network visibility increases with the growth of the IoT and to address security concerns, I am certain the demand for the Yellowbrick solution will also dramatically elevate. I am also sure that the federal market’s ever-increasing appetite for data will also result in substantial growth for Yellowbrick. I look forward to hearing more about Yellowbrick’s continued success in the market and especially about announcements regarding your cloud capability. And thanks to my subscribers and visitors to my site for checking out ActiveCyber.net! Please give us your feedback because we’d love to know some topics you’d like to hear about in the area of active cyber defenses, PQ cryptography, risk assessment and modeling, autonomous security, digital forensics, securing ICS / IIoT and IoT systems, or other security topics. Also, email marketing@activecyber.net if you’re interested in interviewing or advertising with us at Active Cyber™.
About Mr. Christian Shrauder Christian Shrauder is Director of Federal Systems at Yellowbrick Data, where he is responsible for all US Government business. Previously, Christian was CTO of Federal Systems at Fusion-io, where he worked closely with the Intelligence Community and HPC community to implement large-scale data processing systems. Prior to Fusion-io, Christian was Principal Engineer with the MITRE Corporation, where he specialized in performance optimization, digital forensics, and information security. |