Data Science Lab
New Features
- Data Science Lab Models: Implemented custom scripts for the seamless saving, loading, and
prediction of Data Science Lab models. This empowers our team to come up with more efficient data
models with improved accuracy and productivity.
- Auto ML Experiments
- Enhanced AutoML Model Training Process: We have introduced a dynamic Jobs system to support
the AutoML model training process, replacing the previous static container implementation.
- Improved Model Explainability: As part of the same Job, Shap values will now be
automatically generated, providing valuable insights into the factors influencing our models'
predictions. These insights can be visualized using the Explainability dashboard.
- Streamlined Job Management: Once the training process is completed, the associated Jobs
will be automatically terminated, optimizing resource allocation, and ensuring efficient utilization
of computing resources.
- Introduction of Working Directory:
- We have introduced a user-friendly option in the interface that allows users to create new folders
and files directly within the Notebook page for better collaboration and easy access.
- Files can now be uploaded directly into the Data folder located within the respective Projects.
This centralized location ensures that data files are conveniently stored alongside the relevant
project, simplifying data management.
- Algorithms: We have introduced unsupervised Forecasting and Natural Language
Processing algorithms within the Data Science Lab Notebooks. These cutting-edge algorithms
empower us to uncover valuable insights from the most complex datasets.
Please Note: The above-listed Algorithm types
are modified in the Administration module (in the Data Science Lab Settings) as well as Project level
settings (inside the Data Science Lab module).
- PySpark Environment:
- Expanded support for Data Sets as readers within the PySpark Environment.
- Introduced comprehensive support for Data Writers within the PySpark Environment. This
enhancement can enable our team to efficiently write and save data ensuring seamless integration
with downstream processes.
Please Note: The supported Data Sets and Data
Connector types for the Data Set readers and Data Writers in the PySpark Environment are MySQL, MSSQL,
Oracle, MongoDB, PostgreSQL, and ClickHouse.
Enhancements
- Progress Bar: Custom implementation of the Progress Bar inside the Data Science Notebook.