BDB Big Data Pipeline

Real time processing, Elastic Architecture, High Scalability

big-data pipeline

Discover Real Power of Big Data Analytics With BDB Pipeline

BizViz Mission

Let the organized data lead your business strategy in an efficient way. Gain seamless insight into huge amount of structured, semi-structured or un-structured data ranging from UI activities, logs, performance events, sensor data, emails, social media to organizational documents. Support your decisions with an advanced machine learning algorithms and visualization techniques all in one go.

Existing System

Enterprises dealing with multiple sources of business data may have information collected from different ERPs, diverse applications, social media (structured, semi-structured, and un-structured data). If they go by redundancy of processing their data, they need to move the entire volume into batches via ETL layers written with different products, stored procedures, triggers etc. Ultimately, they will end up in complexity without a clear insight.

Data Ingestion Layer

Big data ingestion is about moving data from the Existing Systems or where it originated, into a system where it can be stored, processed, and analysed. BBDP moves the collated data into a massive Data Lake (such as Hadoop HDFS based, Cassandra, HBase etc.)

Data ingestion may be continuous or asynchronous, real-time or batched or both (lambda architecture) depending upon the characteristics of the source and the destination. In many scenarios, the source and the destination may not have the same data format and will require some type of transformation to be usable by the destination system. BDB Data Pipeline harnesses the potential of Apache Kafka integrated with Apache Spark for supporting this activity.

» Read more

Our API gateway built on top of the BDB Data ingestion layer facilitates data lake software neutrality. The coarse-grained API abstract users from the nitty-gritty details of low-level data interaction. Customers can use our API set to ingest data into the data lake layer.

Note: In this case, the customer should follow certain rules, which our system cookbook recommends.

» Read less
BizViz Mission

Data Lake

Data Lake is a repository that holds a large amount of raw data in its native format until it is needed. While a hierarchical data warehouse stores data in files or folders, a data lake uses flat architecture to store data. Each data element in a lake is assigned a unique identifier and tagged with a set of extended metadata tags. With the rise in business, the data lake can be queried for relevant data, and that smaller set of data can then be analysed to help answer the question.

» Read more

BDB Data Lake can be either a Cassandra or HBase or Hadoop HDFS Parquet depending upon what our customer chooses with Apache Spark alongside as the processing engine.

BDB Big Data technologies allow large amount of data in an extremely operative, yet cost friendly manner by keeping many zones in a data lake.

Raw Zone – This zone contains the transactional data which has been ingested, it contains the data ‘as is’ before any Data Cleaning or Unification Layers.

Error Zone –This zone contains disputed data that failed to pass the data quality checks or remains unclear after passing the quality and clean-up processes. The data analysts will re-work on this data to clean it up and move into the trusted zone. Customers are provided with an API layer to avoid nitty-gritty of the Spark + Persistent layer.

Trusted Zone – This zone will have all the cleaned data ready for analytics purpose. The Trusted Zone becomes the Analytics store.

» Read less

Data preparation Layer

The data preparation layer takes care of the data transformation and cleansing activities. The spark computation set feature included isn’t limited to Spark SQL and Spark ML pipeline in addition to our supported set.

Our Scala script is accountable for provisional support to custom transformations.

Our BDB ETL tool provides an extensive toolset for data transformation and cleansing.

» Read more

The Metadata layer contains all the mapping rules and data attributes about the data input and data output workflow.

The Engine can read data from the data Lake through API and it can perform all the required transformations and calculations which are necessary for storing the desired result in a computer storage or an analytical storage.

Our robust MDS UI facilitates to create Compute (Data Pipeline) workflows and schedule it on top of a Data Lake. All the MDS governance can be managed through the MDS UI.

» Read less
BizViz Mission

Compute layer

The BDB Analytics Layer follows a data mart or data store-centric approach to a set of business metrics. After the data preparation operation, the trusted data from the Data Lake can be aggregated, calculated, or filtered to suit a narrower business need into an Elastic Search Store or a Cassandra Store using the Spark Computation logic and Spark SQL. These analytic data stores can be used for reporting, dashboarding, or analytical purpose via our visualisation tool. Customers are provided with an API set to exchange data easily avoiding the complexity of the analytics layer.

Note: The data Lake also has an API set for its interactions.

Advanced Analytics

BDB’s Advanced Analytics capabilities spanning ad-hoc statistical analysis, predictive modeling, real-time scoring, machine learning, elastic search and much more. It helps organizations discover patterns and trends in structured and unstructured data to go beyond so they can go beyond knowing what has happened to anticipate what is likely to happen next. BDB platform has very strong Predictive Analytics product. The Predictive tool has basic transformation capabilities and it integrates with R & Spark ML. One can write Scripts in R, Spark ML Scala & Python to create the models as desired by the businesses to find new opportunities, reduce risks, and increase revenue. Python is used to create the computed view as desired by the businesses to find new opportunities, reduce risks and increase revenue.

BizViz Mission


BDB data visualization is the presentation of data in a pictorial or graphical format. It enables the decision makers to visually experience the analytics presented to grasp difficult concepts, meanwhile identifying new patterns. With interactive visualization, you can take the concept a step further by using technology to drill down into charts and graphs for more detail, interactively changing what data you see and how it’s processed. BDB provides all forms of Data Visualisation (Reports, Dashboards, Advanced Analytics & Self-Service BI) that covers every stakeholder in an organization.

BDB Pipeline Features

All Integrated in One

  • The BDB system works in one platform/integrated ecosystem on Cloud rather than working in SILOS like other vendors.

Reduces deployment time with Multiple Products

  • Big Data Pipeline, ETL, Query Services, Predictive Tool with Machine Learning, Survey Platform, Self Service BI, Advanced Visualization and Dashboards provides everything in one platform.

Customer Approval for BDB Pipeline

  • When BDB was compared to several marquee brands, the buyers have admitted that they would have taken about several years to deploy a BI solution, BDB did it in a few months.

Highly Scalable

  • A customer can scale from a few 100 users to a 100mm in near real time- Such business solutions can be made available from SaaS to a White Label.

Create & Sell your own Subscription Services & Licenses

  • Instead of selling third party licensing package, a customer can sell their own Subscription Services and Licenses – Market opportunity 10x of investment Additional Analytics Services revenue.

Never misses a Project after Proof of Concept

  • BDB has a 100% track record post a POC.