Best practices

Streamlining Data Integration: Unlocking Efficiency with Informatica and Incorta Integration 

Introduction: 

Integrating different data platforms can be a challenging task for businesses. However, it is essential to ensure a seamless flow of data and insights across different systems to enable an effective decision-making process. In this blog post, we will discuss how to integrate Informatica and Incorta, two powerful data platforms, to create a more efficient data processing and analysis workflow. 

The main goal of this integration is to allow users who use Informatica to load data from data sources, perform data cleansing and explore the data catalog to be able to push the processed data to Incorta and explore the Incorta metadata on the Informatica data Catalog platform. To streamline this integration, we have built two tools, the Incorta Custom Scanner plugin and the Incorta Load Automation tool. 

The Incorta Custom Scanner pulls the metadata from Incorta and injects it into Informatica. This plugin allows users to access all of the metadata from Incorta, including tables, insights, and data lineage on the Informatica data catalog platform. The Incorta Load Automation automates the process of syncing Incorta with Informatica. It should be added to the workflow in Informatica Data Quality. 

Setting up the integration system 

The system starts by designing some workflow on Informatica Data Quality, the workflow should push the output as incremental parquet files in a shared disk. The shared disk can typically be Incorta’s shared disk. 

The next step is to load the cleaned table in Incorta. This can be done by creating a data lake-based table in Incorta that reads parquet files incrementally from the shared disk. 

Till this point, we have now data loaded and processed in Informatica, then loaded in Incorta. However, we need to sync the load time of both systems, and we need to clean the staging area after files are loaded in Incorta as it is no longer needed. 

Here comes the role of Incorta Load Automation. The Incorta Load Automation should be added to the end of the workflow as a Command Task. Once fired, it will trigger schema load on Incorta automatically and wait for it to load. Once loaded, it will delete the parquet files in the shared data lake as they are no longer needed. This automation tool ensures that data is continuously synced between Informatica and Incorta, keeping both systems up to date. 

Finally, we add the Incorta Custom Scanner to Informatica as a custom scanner and schedule it to run daily. This scanner retrieves all metadata from Incorta and makes it available in Informatica EDC. The metadata includes Incorta insights (exposed as tables) and the data lineage. This scanner ensures that users can access all the metadata they need on the Informatica data catalog platform to create workflows and data pipelines. 

In conclusion, integrating Informatica and Incorta can significantly improve the efficiency of data processing and analysis workflows, similar to the success story of our valued customer, neoleap. By using the Incorta Custom Scanner and Incorta Load Automation, users can seamlessly move data between the two platforms and access all the metadata they need to create workflows and data pipelines. This integration can help businesses make faster and better-informed decisions based on accurate data and insights.