Data Science Lab
New Features
- Data Science Lab Models: Implemented custom scripts for the seamless saving, loading, and prediction of Data Science Lab models. This empowers our team to come up with more efficient data models with improved accuracy and productivity.
- Auto ML Experiments
- Enhanced AutoML Model Training Process: We have introduced a dynamic Jobs system to support the AutoML model training process, replacing the previous static container implementation.
- Improved Model Explainability: As part of the same Job, Shap values will now be automatically generated, providing valuable insights into the factors influencing our models' predictions. These insights can be visualized using the Explainability dashboard.
- Streamlined Job Management: Once the training process is completed, the associated Jobs will be automatically terminated, optimizing resource allocation, and ensuring efficient utilization of computing resources.
- Introduction of Working Directory:
- We have introduced a user-friendly option in the interface that allows users to create new folders and files directly within the Notebook page for better collaboration and easy access.
- Files can now be uploaded directly into the Data folder located within the respective Projects. This centralized location ensures that data files are conveniently stored alongside the relevant project, simplifying data management.
- Algorithms: We have introduced unsupervised Forecasting and Natural Language Processing algorithms within the Data Science Lab Notebooks. These cutting-edge algorithms empower us to uncover valuable insights from the most complex datasets.
Please Note: The above-listed Algorithm types are modified in the Administration module (in the Data Science Lab Settings) as well as Project level settings (inside the Data Science Lab module).
- PySpark Environment:
- Expanded support for Data Sets as readers within the PySpark Environment.
- Introduced comprehensive support for Data Writers within the PySpark Environment. This enhancement can enable our team to efficiently write and save data ensuring seamless integration with downstream processes.
Please Note: The supported Data Sets and Data Connector types for the Data Set readers and Data Writers in the PySpark Environment are MySQL, MSSQL, Oracle, MongoDB, PostgreSQL, and ClickHouse.
Enhancements
- Progress Bar: Custom implementation of the Progress Bar inside the Data Science Notebook.