BDB Release Notes 9.0

Platform

New Features

Repository Database conversion to MongoDB (with or without SSL).
Admin Module:

Single Sign-On (SSO): Implemented Tenant-specific SSO configuration for seamless and secure access to the platform.
Execution Settings: Users can now process data through execution settings for Data as API and VCS Git Pull/Push operations.
Added new Key “CA File” in the SSL Certificate Settings.
Support for MongoDB SSL Connection configured in the Datastore Settings.

Data Center: The Job Status has been added to the Data Store list. It indicates the running status of the data store load job.
Security:

A new module icon has been provided for the Security module.
The security module now displays the email ID along with the full name of the user.

Data Catalog Search

New Features

Homepage: Introducing a homepage for the Data Catalog Search.
Refresh Catalog: Provided the Refresh Catalog icon for the Admin role users to refresh the Data Catalog list and bring in the latest updates committed by platform users to all the Data asset types.
Data Catalog crawling is done from the Repo MongoDB.
Lineage Structure Creation: Added functionality to create lineage structures with connecting assets.
Change History Tracking: Introduced a History tab to track changes in the selected data asset over a period of time.
Data Import Services: Implemented services to retrieve data from lineage collections.
Job Management: Created jobs for crawling, data profiling, and history tracking.
Data Profiling: Enabled data profiling to be displayed on the user interface for datasets.
Data Pipeline Lineage: Implemented data pipeline lineage structures with sources and destinations.
Project Status: Users can display and update the status of a specific project for a selected asset type. The available status options are Work in Progress, Verified, or Published to indicate the progress of their work.
Each Asset Type displays three standard tabs: Details , Lineage , and History . Additionally, some asset types have specific tabs:

Data Set: Columns, Sample Data, and Data Profile
Feature Store: Sample Data
Data Store: Columns

Description: The Admin role users and owners of the asset can now add descriptions to their Data Catalog searches.
Tags: The Admin role users can insert tags to assets to enhance searchability for other users.

Data Preparation

New Features

New Transforms: Added the following transformations to the Data Preparation module.

ML Transforms

Convert value to column (One Hot encoding)
Discretize values
Expanding window transform
Feature agglomeration
encoding
Label encoding
Lag Transformation
Leave one out encoding
Principal component analysis
Rolling data
Singular Value decomposition
Target-based quantile encoding
Target encoding
Weight of evidence encoding

Functions Transforms

Formula Based Transformation

Message Snackbars: Provided close option in the error/success message snack bars for users to close them.
Data Set Level Suggestions: Implemented data set level suggestions by default while opening the Data Preparation page and highlighting the selected column.
Auto Preparation: The Auto-Prep button gets disabled after it is applied one time on the selected dataset.
Data Profile: The consolidated Data Profile tab is provided displaying Profile, Chart, Suggestion, and Pattern.
Data Preparation Naming: Users need to give a name for Data Preparation to save performed actions on the selected dataset; auto-generated names are assigned if no name is provided.
Profiling Chart: Integrated the profiling chart into the menu bar.
Column Menu: The column menu appears as a drop-down icon instead of a three-bar menu for columns.
Angular Version Update: Upgraded Angular version from 7 to 14.
Transform Drawer UI: Implemented drawer UI for transforms.
Settings: Introduced the Settings icon, opening as a drawer containing SKIP & Total Rows options.

The Settings icon will be disabled for a new Data Preparation based on a Data Set.

Confirmation Drawer UI: A confirmation drawer is provided under the Steps tab for all the transforms that support the configuration dialog box
Column Rename Option: Users can now rename columns by clicking on the drop-down menu icon and get the Rename Column transform in the context menu.
Column Profiling: Improved efficiency for large-scale column changes, reducing time consumption.
Existing Transforms: Modified the Create new Columns checkbox to the New Column Name field for new transforms.
Total Rows: Updated the displayed rows of data on a page to accommodate up to 200 rows per page. At once, up to 5 paginations can show up to 1000 rows.
Source and Sample Size information: Included source and sample size information in the panel below.
Save Icon: introduced a Save icon for saving data preparation. The applied Data Preparation steps will get auto-saved each time.
Close Icon: Replaced the Back icon with the Close icon in the Data Preparation workspace to close a Data Preparation workspace and navigate to the Data Preparation list.
Data Set Info Icon: A newly created Data Preparation based on a Data Set will contain an Information icon in the enabled status. Once the Preparation is saved the Information icon will be removed from the Data Preparation menu bar.

Enhancements

UI updates for Suggestions:

Proper placing of the Suggestions section with the name of the selected column.
The Apply button gets enabled only after the selection of one suggestion.
Implemented highlighting of suggestions for the selected column.

Please Note: The column-level suggestions are displayed based on the selected column from the dataset. The Suggestions section no longer supports the generic suggestions encompassing all columns in the dataset.

Report

New Features

Themes: 8 new themes are provided to visualize the report. They are:

Ultramarine Blue Color Palette
Pacific Color Palette
Briar Rose Color Palette
Ocean Current Color Palette
Pumping Space Color Palette
Barbie Pink Color Palette
Amber Palette
Blue Period

Enhancements

UX Enhancements

The Toggle button provided for switching between the old and new UI is vertical.
The search bar can be increased based on the selection of more dimensions and measures.
A Remove button has been provided to clean the search bar.
The ‘GO’ option has been converted into an icon.
A drag icon has been added for the dimensions and measures in the left side panel on the Design screen.

PDF Export Properties for the Grid chart -A Grid section has been added to choose the PDF export from the following options:

Screenshot- If this is selected the Report gets exported as Screenshots
The Tabular option allows the users to select columns from the Analyze mode for the PDF export.

Tab UX enhancement- Users can interchange the tab location.

Please Note:

The Alert functionality has been deprecated from the Report module.
ML View (Sentiment chart) – Column Stack chart is removed from the chart list because it affects the other functionality like Duplicate the View, Download, etc. Some more chart options will be provided in a future release.

Designer

Enhancements

Multiple Label selection is provided for the Checkbox filter component.
TreeMap component - Property to control stroke line between blocks.

Data Science Lab

New Features

Homepage: Introducing the Home page for the Data Science Lab module, providing a centralized hub for accessing key functionalities and project management from the same page.
Feature Store: Implementation of the Feature Store feature enables users to store, manage, and share feature sets across projects for enhanced reusability and collaboration in the data science workflows.
Default Settings Page: A Settings page has been added with default settings for project creation ensuring consistency across projects for the project setup.
Trash: Implementing a Trash page for temporarily storing and recovering deleted Projects and Feature Stores to prevent accidental data loss.
Model Explainability as Job:Introduced Model Explainability as a Job feature for DSL models, enabling users to generate model explanations and interpretability insights as part of automated job processes, facilitating better understanding.
Python Library for Data Preparation: Introduced Python library dedicated to data preparation tasks, providing a comprehensive set of tools and utilities to streamline data processing workflows.
Data Preparation UI Enhancement: Implemented Data Preparation as a Drawer in the user interface, offering a convenient and intuitive way for users to access data preparation functionalities within the platform without disrupting their workflow.
Sequential Cell Execution (UI): Added the ability to execute cells sequentially in the user interface, enabling users to run code cells one after another in a predetermined order, facilitating step-by-step execution in the data science workflows.
Function Parameter Descriptions for Datasets (UI): Added function parameter descriptions for Datasets in the user interface, providing users with clear and concise explanations of parameters to aid in dataset selection and configuration.
Implementation of Append Method in DS Lab Writers: Implemented the append method in the DS Lab writers, allowing users to append data to existing datasets supporting incremental data updates.
Project Creation Page as a Drawer: Modified the Create Project UI to display it as a Drawer for a more streamlined and efficient project creation experience.
Updated the PySpark Version to 3.4.0. to enhance performance and compatibility with the latest technology.
Pull Multiple Utils Files from Git (Normal Projects): Added the ability to pull multiple Utility files from Git in normal projects, enabling users to easily incorporate utility functions and scripts stored in Git repositories into their projects.
Leave site confirmation pop-up has been added to avoid accidental page refresh or closure of the page with unsaved work.
Validation Option: A validation option is available for CPU and Memory at the Project level.
Import File: Provided Import File option for the Repo Sync Projects.
Add Folder Option: The Add Folder option is now available for normal Projects as well.
Registered Models & APIs: Introduced a new tab named Registered Models & APIs to easily access lists of registered Models and APIs.
Introduced the Preview option for files in the files folder.
Implemented Linter support: Linter checks code for potential errors, style issues, and adherence to coding standards, helping to improve code quality and consistency.

Enhancement

Save Notebook: Enhanced service support to provide a seamless user experience while saving a Notebook.
Workspace UI Enhancement: Enhanced the user interface for normal projects by introducing a repository tree within the Workspace tab, allowing users to easily navigate project files and directories.
Enhanced Refresh Option: The user gets redirected to the specific models, tabs, or notebooks while refreshing them.

Data Pipeline

New Features

Script Executor Job: Implemented support for Python, PySpark, Go, and Perl scripts in the Script Executor job.
Athena Reader: Introduced the Athena Reader component for enhanced data reading capabilities.
Python On-Demand Job: Implemented the Python on-demand job feature for efficient execution.
Alert View Update: Replaced the Yellow Information icon in the alert view for improved visibility.
Spark Job Format Flowchart: Added a format flowchart for Spark jobs to streamline job configuration.
Disabled the copy button for Rule Splitter, File Splitter, and Schema Validator components, and disabled it if component metadata is not saved.
Central Monitoring: Implemented central monitoring for pipelines and jobs for streamlined management.
Real-time Data flow: Provided Data flow stream for the active pipeline.
Pipeline Component Version Update: Added functionality to update pipeline component versions for improved compatibility and performance.
Edit Job Button: Added an Edit button on the List Job page for quick access to Job configurations.
Preview Panel Enhancements: Provided download option in CSV, Excel, and JSON formats in the Kafka preview panel for easier data analysis.
CSV File Format Enhancements: Provided Multiline, Custom header, and Separator fields support in CSV file format for HDFS reader, Sandbox reader, Azure Blob reader Spark, and Azure Blob reader Spark Docker (for Job/ Pipeline).
ORC File Type Support: Added ORC file type in Sandbox, HDFS, and S3 including reader and writer (for Job/Pipeline).
Expanded Job Configuration: Enhanced the Job List page with a total configuration view on expanding job details.
Pipeline Overview Enhancements: Enhanced pipeline overview with a customizable color theme, clear level checkbox, logo sizing options, and expanded description of the text area.
Job Trigger Component: Introduced a Job Trigger Component for automated job scheduling and execution.

Enhancements

Data Channel & Cluster Events Page UI Enhancement: Revamped the UI for the Data Channel & Cluster Events page, along with enhancements to all pipeline topics for improved usability and aesthetics.
Default Configuration Page UI Enhancement: Enhanced the UI of the Default Configuration page to provide a more intuitive and user-friendly experience.
System Pod Details Page Enhancements: The Spark operator logs are also displayed on the System Pod Details page.
Job History Page Enhancements: Enhanced the Job History Page to include system logs for Spark jobs within job details history, facilitating comprehensive job tracking and analysis.
Data Metrics Page Enhancement: Added a filter date range button with a drop-down menu to the Data Metrics Page for streamlined data analysis based on specific timeframes.
UI Enhancement: Pipeline/ Job property panel - Upgraded the UI for the Pipeline/ Job property panel to enhance user interaction and navigation during pipeline and job configuration.
Sandbox Writer Enhancement: Improved the Sandbox Writer functionality to support writing data in parts files within a directory, optimizing data storage and retrieval.
Settings Enhancement: Enhanced the Default Configuration page of the Settings section to provide more comprehensive and customized configuration options.

Product Release Notes

Platform

Data Catalog Search

Data Preparation

Report

Designer

Data Science Lab

Data Pipeline

Connect with BDB Expert