Skip to main content

Blog

JSON, SQL and Marketing Analytics

More and more developers are turning to NoSQL databases and document databases in particular, for their flexibility and support in marketing analytics. In many instances, this approach can be far more efficient than using Hadoop and HDFS. Document databases store JSON objects (or other semi-structured objects) and provide specialized query tools for retrieving and manipulating data. But, what if you could have the flexibility of JSON structures combined with the expressive query capabilities of SQL? You would have the advantages of storing flexible structures combined with the power of a well-known and expressive query language. With BitYota’s Data Warehouse as …

Analytics Tools to Optimize Digital Marketing Spend in a Multi-Channel World

These days, marketers are in a “best of times and worst of times.” It’s the best of times because they have direct access to potential customers through multiple marketing channels that allow them to collect detailed data about those individuals. It’s the worst of times because much of this valuable information lies dormant in large volumes of data that are challenging to integrate, and even more difficult to analyze. Among other things, this leaves marketers wondering which are the most effective channels for reaching prospects. Another challenge they confront is how they can use this data to optimize a customer’s …

Query Across Data Tiers: Analytical Benefits without the Query Limitations

Let’s consider a typical data warehouse application that provides details on time series data, such as sales by region, product, and several other dimensions to support a business analytics or digital marketing application. Its schema is designed to optimize the query performance of the most frequently run queries. Not surprisingly, these are queries about the most recent sales data. There is also some reference to older data, such as sales from the same period last year, but those attributes are calculated and stored along with current period data. (Yet another example of the benefits of denormalization, but that is a …

Limitations of Amazon Redshift in the Analytics Stack

Amazon Redshift has become a popular choice for cloud-based data warehousing but it is not always the best choice, especially for analytics use cases. This point of view stems from the architectural constraints of Redshift data warehouses. To understand how constraints of a data warehouse service can impact analytic operations, it helps to clarify some key characteristics of analytics-oriented data warehousing. Analytics is an iterative exploratory process. Unlike business intelligence reporting, analytic operations start with a hypothesis. This could be something as simple as assessing how changing the size or shape of icons on a Web page will alter customer …

The Power of User Defined Functions

Data warehouse applications and analytics are increasingly complex. We load data in structured, semi-structured and unstructured formats and expect greater insights from this richer combination of data sets. These insights are not always easy to come by. Once we have our various data sources loaded, we typically need to perform extraction and transformation operations. These operations need to work on data in their different forms while keeping up with the volumes of data we are ingesting. In many instances, it makes sense to apply transformations during load operations when the results of the transformation will be frequently used in a …

How Modern Data Warehouse Architectures Support Price/Performance Choices

The cloud has fundamentally changed the way we deliver data warehousing and analytics services, but not all cloud-based data warehouse solutions fully exploit the potential cost advantages of the cloud. Early cloud data warehouse services replicated the on-premise deployment model in the cloud. Data warehouses were typically deployed as single large servers or a cluster of servers with tightly coupled storage. Database administrators had some ability to scale to variable loads, but the tight coupling of storage and compute resources made it difficult to shut down the cluster. In order to do so, they were required to copy data to …

All SQLs are Not Alike: Part 2

Although the SQL query language has been formally described in a series of standard definitions, recent advances in data management technologies have led to variations in query languages. The advent of big data platforms, such as Hadoop, and NoSQL databases, such as Cassandra, brought with them query languages suited for their underlying data management architectures. These different query languages can lead to substantially different performance characteristics. The features of a query language define the range of queries that can be expressed. Expressive queries can target specific subsets of data. This allows developers to focus on writing application code that processes …

All SQLs are Not Alike: Part 1

The best way to describe the various implementations of SQL across the spectrum of relational databases is “the same, but different.” Proponents of a particular implementation might understandably emphasize how well their favored tool meets the needs of SQL developers, while paying less attention to the lack of standard language features. Developers who are accustomed to full-featured SQL implementations will quickly spot the lack of key features in limited SQL implementations. This lack of expressiveness in a SQL implementation can adversely affect developer productivity and performance. Any query language that uses the label ‘SQL’ anywhere in its name will have …

Data Loading: The Need to Support Semi-Structured Data

Modern data warehouse solutions must accommodate a variety of data sources. Not all these sources behave like traditional enterprise applications such as enterprise resource planning (ERP) or customer relationship management (CRM) systems. Social media streams, instrumented devices from automobile and truck fleets, to emerging Internet of Things (IoT) devices are potential sources of data for expanding the scope of analytics that are done using modern data warehouses. Two characteristics distinguish modern data warehouses from their traditional counterparts: widespread support for semi-structured data and the need for real time data ingestion. Semi-structured data is an increasingly important format because of JSON’s …

ELT vs ETL: Adapting The Need for High Performance JSON Loads

Extracting, transforming, and loading is a fundamental set of data warehouse operations. The transformation stage is often done prior to loading (ETL) but transforming after loading (ELT) can be a better option. This is particularly the case when working with semi-structured data. NoSQL document databases natively support JSON, XML and other semi-structured data types. The growing popularity of document databases is due, in part, to their schema flexibility. Data modelers do not need to define a fixed schema. Instead modelers and developers can incorporate new attributes as needs change without modifying a predefined data model. Developers can also take advantage …

Supporting Multi-tiered Storage and Access in Cloud Data Warehouses

Data warehouse designers are constantly expected to perform a balancing act: deliver high performance, low latency data warehouse queries while keeping costs down. Some common best practices in data warehousing, such as partitioning and star schemas, are motivated in large part by performance concerns. Even with good design practices and reasoned infrastructure design, there is a limit to the marginal benefits data warehouse designers can squeeze out of existing hardware. In the past, many faced the choice of tolerating less than expected performance or paying substantially more for somewhat better disk performance. Cloud data warehouses have improved the options using …

BitYota DWS is Available Today on Microsoft Azure

We are pleased to announce that BitYota Warehouse Service (DWS) is now in general availability (GA) for customers on the Microsoft Azure cloud computing platform. We are also proud to point out that BitYota is currently the only Data Warehouse as a Service (DWaaS) available today for installation on Microsoft Azure, which means Azure shops can immediately apply our massively parallel-processing (MPP) engine to a wide range of tasks. In this post, I’d like to introduce you to the salient features of BitYota on Azure. One of the biggest benefits we bring to the table is that Microsoft Azure-based enterprises …

First Principles For a Cloud-based Data Warehouse Service – specialized instanced for specialized tasks… buy only what you need!

The advent of cloud technologies has led to strong winds of disruptive innovation and change blowing across the RDBMS landscape. The ubiquity and scalability of cloud computing offer enterprises more power and flexibility than ever before, and demands consideration of new approaches to manage and process structured and unstructured data together. Rob Klopp, a noted database expert and commentator, recently published a white paper that discusses these trends and provides guidance for database administrators as they weigh different architectural approaches to leverage the benefits of the cloud. This is the first in a series of posts in which we will …

Partitioning Your Data in BitYota’s DWS For Performance

When dealing with data, particularly Big Data, the performance of your queries can be greatly improved by organizing the data inside BitYota’s data warehouse to meet your typical data access patterns. BitYota enables you to do this data organization using partitioning. Partitioning decomposes very large tables into smaller and more manageable pieces or partitions.In BitYota’s DWS, partitioning is equally applicable to structured or semi-structured data. This means you can partition your JSON data (loaded as native JSON, not as text) using individual keys embedded in your JSON document without having to first transform every key into a physical column (unlike other traditional data warehouses like Redshift ). This is one of our core tenets for big data exploration - having to impose an a priori structure on varying or unexplored semi-structured data is detrimental to the velocity and flexibility of analytics and should be avoided.

Providing your own schema for loading data into BitYota’s DWS

Before you load data into a BitYota DWS table, you need to specify a table schema - BitYota assists you by sampling your data before load and suggesting a schema; but for users who know the table structure they want, we offer the ability to override the suggested schema in one step. Another good reason for specifying your own schema is if you have semi-structured JSON data and want to project a few frequently accessed elements into columns for query performance. You may also want to created computed columns and add them to your table.

“Share and Enjoy”… or How to Use instance groups in your BitYota Data Warehouse

Simple, easy segregation of your BitYota DWS cluster resources, using storage and compute instance groups leads to better utilization, easier capacity planning and happier users

Orchestrating a Real-time Data Pipeline with BitYota

Frequently, you want to manipulate the data in some way either before or after every load, say to check incoming data quality or to build aggregates that can be consumed by your BI tools. And it all has to be in real-time, on fresh data. After seeing our customers go through the pain of doing this as they loaded data into BitYota, we created a simple workflow capability that allows sequential and scheduled execution of one or more statements and User Defined Functions (UDFs) in SQL, Javascript, Python, etc in conjunction with a data load. We call these pre and post load processing steps.

Big Data – a strong case for ELT (and not ETL)

Big data is forcing us to revisit data pipeline processing from ETL/ELT to data discovery and BI metrics. Lets discuss why ELT should be the preferred technique for Big Data and not ETL. Some argue that the difference is merely semantic – its a matter of where processing happens and the same end result can be achieved in both methods. However, its about what you want from the data, how quickly, availability of system resources, data architecture and economics.

Solving the Challenge of Data Integration in Big Data Analytics

Anyone who is in the business of  big data analytics will tell you that significant effort goes into setting up and managing the data pipelines to extract and integrate data from disparate sources before analysis. A. Data Pipeline setup You just got access to a new source. Now the pressure is on to understand it’s value … Integrating a new data source or data set can be daunting. First, an in-depth knowledge of the data is required to load it. Data format, schema, frequency, delimiters, and layout are some of the many attributes which need to be understood. Second, the …

"Kobayashi Maru" a.k.a.The no-win scenario for Complex Analytics on MongoDB

MongoDB is great… MongoDB’s horizontal scalability and flexibility in handling changing data structures make it an ideal choice for agile application development. Additionally, the ability to quickly create read-only copies of the data via sharding/replication make it possible to run real-time dashboards for simple operational metrics such as counts of unique users directly on top of an operational Mongodb instance. ….But for Complex Analytics? Beyond simple metrics, businesses also need to gain deeper insights by analyzing, slicing and dicing data in various ways; for example – correlating user visits with purchases, identifying your most popular products and your most important …

Get insights in minutes

Find the value you can add for your customers and your business today. Spin up a node, load your data and start running analysis.