Tech Blog
3 MIN READ

Intelligent execution of ETL processes

In mature and complex Enterprise DWH systems, many ETL processes are developed to collect data from sources, perform various transformations and finally expose that data to business users. If somebody woke you up in the middle of the night, and asked if you could enumerate all the processes that have to be executed before the process critical for an important report you have to deliver to management you would rightly say no.
Intelligent system taking care of dependencies among processes answers this question in a couple of clicks. It knows what processes could be started under which conditions, and starts them as soon as all prerequisites are fulfilled. This principle is called RWYC (run what you can).

How does the system know that?

Each time you develop a new ETL process, your ETL tool saves it its metadata in a repository. NEOS DI Framework looks into the repository to find out what source is your process reading, and which target it is writing to. With the known source object of your ETL process, you can find out the name of the process that loads it and find its source object… and so on… you’ve got it, right? A reliable framework resolves all those dependencies down to the root and makes them easy to read.

Just click on the Graph

DI Console is very useful NEOS DI Framework component. Remember that important ETL process from midnight wake-up? Select it in console, click Graph button and voila…
Scheduling ETL processes is rarely simple. Most of the time you need many other options because relying on prerequisites automatically identified from ODI repository is not enough. Sometimes you might want to add some exceptions, or create your own rules. Or, your ETL process could consist of code from which framework cannot easily detect source and target objects. In such cases, you enter dependencies manually and Framework uses them like they came from repository. Beside adding dependencies, you may need to add exceptions so that you can exclude some rules under certain circumstances, and Framework counts them in as well.

Make a dev happy

Framework will shield you from the logic of calculating fulfillments of prerequisites for execution of the ETL process. It will let developers focus on translating business logic into data.
Once ETL process is developed and tested, and business user green lights UAT to production, developers have to give the framework some important details! How frequently will this process be executed (daily, monthly, or maybe ‘every third tuesday in month’)? You might also want to pick the right calendar for the framework to use, for processes executed in an international environment, then rule like ‘third working day of the month’ may result with different dates in different countries.
You may need to define priorities among your processes, since prerequisites for many processes could become fulfilled at the same time. Probably some processes are favourites which are far more important for business users, and you want to execute them before others. Bear in mind that not all processes need the same quantity of resources, and sometimes some of them seem to grab all for themselves, so you may need to limit them to allow other processes to finish on time. All those parameterization actions are done in Framework, not in the ETL process.

What a dynamic execution plan means?

Once you put your ETL process in production environment, all you have to do is set just one parameter in Framework to start prerequisite checks, and when all are done, Framework starts your process. That way, Framework makes sure your processes are started the fastest possible way. Checking of prerequisites is performed permanently.
You want to check what is going on with that process running in production? Has it started and when? How long did its execution last? Was it successful or not? Or… maybe it did not start yet, because there are processes that it depends on, which are not yet finished? Just check in DI Framework Console.

Data warehouse with different ETL tools, but each with its metadata repository?

NEOS DI Framework is layered architecture. Reading from and writing to every single ETL tool repository placed in one of the framework’s layers, so that all upper layers deal with unified metadata structures.
That enables simultaneous work with more different ETL tool repositories (e.g. OWB and ODI11 and/or ODI12), and unified parametrization on top of all repositories. It comes in handy when transitioning from one ETL tool to another. During transition you bring in ETL processes developed by new ETL tool gradually. You want both types of ETL processes to be orchestrated by the same Framework, which means that statuses that old processes write to Framework can normally be used by framework’s engine to calculate prerequisites for new processes.
And that’s exactly what Neos DI Framework will allow you to achieve. And many more – if interested for more details, feel free to check with us.

Skip to content