Skip to main content


Partitioning Your Data in BitYota’s DWS For Performance

When dealing with data, particularly Big Data, the performance of your queries can be greatly improved by organizing the data inside BitYota’s data warehouse to meet your typical data access patterns. BitYota enables you to do this data organization using partitioning. Partitioning decomposes very large tables into smaller and more manageable pieces or partitions.In BitYota’s DWS, partitioning is equally applicable to structured or semi-structured data. This means you can partition your JSON data (loaded as native JSON, not as text) using individual keys embedded in your JSON document without having to first transform every key into a physical column (unlike other traditional data warehouses like Redshift ). This is one of our core tenets for big data exploration - having to impose an a priori structure on varying or unexplored semi-structured data is detrimental to the velocity and flexibility of analytics and should be avoided.

Providing your own schema for loading data into BitYota’s DWS

Before you load data into a BitYota DWS table, you need to specify a table schema - BitYota assists you by sampling your data before load and suggesting a schema; but for users who know the table structure they want, we offer the ability to override the suggested schema in one step. Another good reason for specifying your own schema is if you have semi-structured JSON data and want to project a few frequently accessed elements into columns for query performance. You may also want to created computed columns and add them to your table.

“Share and Enjoy”… or How to Use instance groups in your BitYota Data Warehouse

Simple, easy segregation of your BitYota DWS cluster resources, using storage and compute instance groups leads to better utilization, easier capacity planning and happier users

Orchestrating a Real-time Data Pipeline with BitYota

Frequently, you want to manipulate the data in some way either before or after every load, say to check incoming data quality or to build aggregates that can be consumed by your BI tools. And it all has to be in real-time, on fresh data. After seeing our customers go through the pain of doing this as they loaded data into BitYota, we created a simple workflow capability that allows sequential and scheduled execution of one or more statements and User Defined Functions (UDFs) in SQL, Javascript, Python, etc in conjunction with a data load. We call these pre and post load processing steps.

Big Data – a strong case for ELT (and not ETL)

Big data is forcing us to revisit data pipeline processing from ETL/ELT to data discovery and BI metrics. Lets discuss why ELT should be the preferred technique for Big Data and not ETL. Some argue that the difference is merely semantic – its a matter of where processing happens and the same end result can be achieved in both methods. However, its about what you want from the data, how quickly, availability of system resources, data architecture and economics.

Solving the Challenge of Data Integration in Big Data Analytics

Anyone who is in the business of  big data analytics will tell you that significant effort goes into setting up and managing the data pipelines to extract and integrate data from disparate sources before analysis. A. Data Pipeline setup You just got access to a new source. Now the pressure is on to understand it’s value … Integrating a new data source or data set can be daunting. First, an in-depth knowledge of the data is required to load it. Data format, schema, frequency, delimiters, and layout are some of the many attributes which need to be understood. Second, the …

"Kobayashi Maru" a.k.a.The no-win scenario for Complex Analytics on MongoDB

MongoDB is great… MongoDB’s horizontal scalability and flexibility in handling changing data structures make it an ideal choice for agile application development. Additionally, the ability to quickly create read-only copies of the data via sharding/replication make it possible to run real-time dashboards for simple operational metrics such as counts of unique users directly on top of an operational Mongodb instance. ….But for Complex Analytics? Beyond simple metrics, businesses also need to gain deeper insights by analyzing, slicing and dicing data in various ways; for example – correlating user visits with purchases, identifying your most popular products and your most important …

The War of the Roses – MongoDb Data Structures and Complex Analytics

When you created your MongoDB, you created a set of collections which likely made sense for your specific web serving needs. They may have evolved over time, but your data is somewhat settled into what you believe is the current format. Let’s take the example of a blogging site. The majority of the data needed to run the site is in 2 collections – “sites” and “authors” — lets look at the “sites” collection. The Sites document has all the information needed to render a blog-page, organized like this:     { _id:"site42", sitename:”Hartley’s Fly Fishing”, url: “/hartleysflyfishing”, posts: …

NoSQL Stores – Scale & Performance for Transactions Yes, But For Analytics?

In the past few years NoSQL document based stores like MongoDB have made great inroads in new applications for many good reasons. Unlike traditional SQL databases, you don’t have to design your information schemas upfront to the n-th degree of detail. Instead you can rely on schema-less json structures to keep all your data together for your application. The content of the JSON document can evolve over time, without impacting your downstream database (no need to schedule database changes to add column or modify the schema). This is no small matter — a good programmer’s freedom to choose a data …

How to use Standard SQL over JSON with BitYota’s DWS

Document­-oriented, NoSQL databases have gained a lot of traction in the past few years, adopted by many companies for agile, scalable application development. Their general­ purpose document­-oriented design make them appropriate for a large number of use cases such as content management systems, mobile apps, gaming, e­commerce, analytics, archiving, and logging. NoSQL databases such as MongoDB are also good for real ­time operational dashboards for BI, because of their ability to support indexes as well as agility in handling changing data structures. On top of dashboards, businesses also need to gain deeper insights by analyzing, slicing and dicing data in …

Even an improbability drive needs coordinates.…

Big Data is nothing new – we know credit card companies, retailers, research labs, etc have spent the last few decades collecting, modeling and analyzing large data sets, at great cost. So what’s changed? Well, with billions of people using multiple devices and apps to access the Web, the very nature of this data has changed -it is no longer static, well structured or predictable in advance. Rather it is “polymorphic”, shape-shifting in format over time. This polymorphic data breaks our well-defined workflows of bringing data into a data mart for analytics. The old way used to involve collecting structured …

About BitYota

The Hitchhiker’s Guide to Big Data Analytics Welcome to the company blog from BitYota, the next gen Data Warehouse-as-a-Service for Big Data Analytics, accessible by anyone, anywhere. We are an expert and well-funded team with 35+ years of cumulative experience in building data platforms from Oracle, Informix, Yahoo!, Veritas, Tibco, etc. Our vision is to make data and analytics accessible to all. We know our vision is audacious and requires a complete rethinking of traditional data warehousing concepts and technologies. We believe we have the team, the tenacity and talent to do this and we are excited to take this journey with …

Get insights in minutes

Find the value you can add for your customers and your business today. Spin up a node, load your data and start running analysis.