How Streaming Works in SciDX:

Step 1: Register Data Sources
SciDX makes it easy to integrate various data sources, including:
- Kafka topics
- URL-based resources like CSV, JSON, TXT, and NetCDF files
When registering a data source, you can specify:
- Resource Names: Give each source a meaningful identifier.
- URLs: The location of each resource.
- File Types: Supported formats (CSV, JSON, TXT, NetCDF).
- Metadata and Processing Configurations: Optional settings to specify fields and apply processing.
Mapping: Choose specific fields to include in your streams, renaming them as needed. If no mapping is specified, all fields are included by default.
Processing: Customize how the data is handled. If left unspecified, SciDX automatically optimizes settings based on the data type, streamlining configuration.
Step 2: Create Kafka Streams
Once your data sources are registered, you can generate Kafka streams tailored to your needs by
- Combining Data Sources: Merge data from multiple registered sources.
- Applying Filters: Refine streams to include only the information that matters.
SciDX ensures efficient resource management, automatically closing inactive streams and scaling consumers dynamically to save resources.
Step 3: Filtering and Data Management
SciDX provides powerful filtering capabilities to manage and refine your data streams based on specific conditions.
Filtering Options:
- Column Comparisons: Compare values across different columns.
- Mathematical Operations: Use mathematical expressions to set filter rules.
- Nested Values: Handle complex, nested data structures such as arrays.
- Conditional Logic: Implement IF-THEN-ELSE workflows to manage data based on dynamic conditions.
- Logical Operators: Use AND/OR operators for compound filters. (Support for parentheses-based conditions is coming soon!)
Upcoming Feature: Window-Based Filtering Soon, SciDX will support window-based filtering, enabling moving averages, event-based windowing, and other advanced operations, adding a new layer of precision to real-time data processing.
Smart Stream Management and Resource Optimization
Efficiently managing resources is a core feature of SciDX. The platform ensures optimal utilization by:
- Closing Idle Consumers: When consumers are idle, they are closed to prevent unnecessary resource use.
- Stopping Unused Streams: If a stream has no data left to send, it is stopped automatically.
These features lay the groundwork for a fully adaptable, resource-efficient platform that meets the evolving needs of your data workflows.