NiFi for Apache - using MiNiFi


In this guide, we will use the lightweight version of NiFi, Minifi, that will run on an Apache web server, looking for new event written in the Apache access logs.
MiNiFi is a lightweight version of NiFi, without the web interface and with only a limited set of processors. It doesn’t take a lot of resources on the host it is running.
It can be used as a “Forward-only” to any central NiFi server you have previously setup.

Configuring MiNiFi

The easiest way to create a configuration file is to create the flow visually into the NiFi Web GUI, export it as an XML file and convert it on the Apache host(s) with the MiNiFi toolkit.

Building a MiNiFi flow

So let’s start on our central NiFi server by building the MiNiFi configuration. Go to the Web GUI of the NiFi server.
  • Add an input port, you are prompted to give a meaningful name to it. Let's call our port "RemoteMiNiFi". The port is created, with a disabled state.
This Input Port will be used to receive flow files from the remote MiNiFi.
  • Click the port on the workspace and in the Operate window, click on the enable button (the lightning).
This action will put the Input Port in the enabled state. There is a warning of course as the port is still not connected to any processes.

We will then use the NiFi workspace to create a basic flow, which we will export as an XML file to use it as a MiNiFi base configuration.
  • We add a first processor, TailFile and then a Remote Process Group. The last one is added using the 5th icon from the top bar of the NiFi workspace.
When adding the Remote Process Group, you are prompted to enter its configuration:
When configuring this processor, we can:
  • Follow one single file
  • Follow multiple files which should be located below the same Base Directory. You can use a Regex to determine the file names and their sub-directories.
  • Inform the processor about the rolling file strategy (a new name, the same name, …)
Configuration to follow one single Apache access log file, taking the log rotation into account:
  • Tailing mode: Single file
  • File(s) to tail: /var/log/apache2/access.log
  • Rolling filename pattern: ${filename}.log.* If NiFi restarts during the moment the log file is rolled by the system, setting this will allows NiFi to read the missed log line now in the rolled file and continue with the main log file. The pattern used assumed the rolled log file lives in the same directory as the active log file.
  • Initial start position: Beginning of file : when started the first time, this processor will look at all lines present. So you should have a processor somewhere in your flow that create a timestamp based on the date/time extracted of the event.
  • State location: local (This processor will be run by MiNiFi which is local to the Apache server, and not in a cluster of any kind).
  • Recursive lookup: false
  • Rolling strategy: Fixed name. Once rolled, the log file keep the same name.
Configuration to follow more than one log files on the server, eg. Each virtual host has its own Apache access log file.
  • Tailing mode: Multiple files
  • File(s) to tail: [drupal7|drupal8|gallery|webshop].log
  • Rolling filename pattern: ${filename}.log.*
  • Base directory: /var/log/apache2
  • Initial start position: Beginning of file
  • State location: local
  • Recursive lookup: false. Specify if all log files to follow are together in the base directory (false) or in sub-directories (true).
  • Rolling strategy: Fixed name
(properties not listed here keep their default settings)
  • Then we drag a connection from the TailFile processor to the Remote Process Group.
If you right click on the Remote Process Group icon, in the menu, you can choose "Remote Ports". If you have followed all the steps above, you should see the RemoteMiNiFi port defined before:
  • Select the processor, the Remote Process Group and the connection then click on the "Create template" icon in the Operate windows.
  • Give a name to your template, eg: Apache_minifi
  • Go to the main menu, at the right, and choose "Templates"
  • In the next screen, at the right of the line containing the name of the template you've just created, you have a download button.
Click on it, and depending on the settings of your web browser, you may be prompted to save an "Apache_minfi.xml" file on your local computer.
Once saved locally, this file must be uploaded to each Apache servers where you plan to run MiNiFi to send the log events to your central NiFi server.
Download the above NiFi template sample here.
After the XML file is successfully created, you can delete all the elements belonging to the MiNiFi configuration. Thus you end up with a workspace where you have a single and alone input port, the “RemoteMiNiFi” created at step 1.

On each of your Apache host, where MiNiFi will be running

For the next steps of this guide, we have uploaded the "Apache_minifi.xml" in the home directory of the account used to logon in the Apache server (admin).
We also need to download the MiNiFi package from HortonWorks or the NiFi web site. As we started with an HortonWorks Data Flow setup, we will take the package from HortonWorks. We need to download the MiNiFi Java Agent and the MiNiFi Toolkit. There is also a C++ version of the MiNiFi agent, but as this one is still a technical preview when writing this tutorial, we are not going to use it (yet).
You will find the direct download URL in the Release Note of the HDF version you are using.
Extract the archive files in some directory.
Now, you will have to:
  1. Install the service
  2. Convert the XML template into a valid YML MiNiFi configuration
  3. Start the service
websites(admin):/opt/minifi$ sudo bin/ install
websites(admin):/opt/minifi/toolkit$ sudo bin/ transform /home/admin/Apache_minifi.xml /tmp/config.yml JAVA_HOME not set; results may vary
Java home:
MiNiFi Toolkit home: /opt/minifi/toolkit
No validation errors found in converted configuration.
websites(admin):/opt/minifi/toolkit$ cd ..
websites(admin):/opt/minifi$ sudo cp /tmp/config.yml conf/

If your NiFi server is not SSL enabled, then your are ready to start the MiNiFi agent. (go to Starting MiNiFi)
If your NiFi server is SSL enabled, go thru the next section

Steps to perform if the NiFi server is SSL-enabled

With the NiFi toolkit, we will generate the necessary private key, certificate and keystore to setup an SSL link between MiNiFi and our SSL-enabled NiFi server. You will have to download and extract the NiFi toolkit package on your host running MiNiFi. The URL is also available into the Release Notes of your HDF version.
Of course, if you prefer, you can install and run the NiFi toolkit from any other machine. You will have to copy the truststore.jks and keystore.jks to the systems running MiNiFi. For security reasons, it is recommended to generate a different set of keystore / truststore per server. This way, if one host is compromitted, you just have to disable the certificate corresponding to it.
bin/ client -c <nifi host FQDN> -D "CN=<name>, OU=nifi" -t <CA Token> -p 10443 -T PKCS12
  • “CA Token” is the secret CA Token defined in the Ambari configuration
  • “Name” can be what you want, but the resulting DN must match the “Initial Admin” configured with Ambari. Remember to type the DN exactly like you did in the NiFi configuration (Initial Admin parameter value) – same case, same number of blanks.
In case of bad « CA Token », you will see an error like that :
Service client error: Received response code 403 with payload {"hmac":null,"pemEncodedCertificate":null,"error":"forbidden"}
With the information found in the config.json created after a successful run of the, you will complete the “Security Properties” section of the generated config.yml file.
You need the following parameters:
Security Properties:
  keystore: '/opt/minifi/conf/keystore.jks'
  keystore type: 'jks'
  keystore password: '<see config.json>'
  key password: '<see config.json>'
  truststore: '/opt/minifi/conf/truststore.jks'
  truststore type: 'jks'
  truststore password: '<see config.json>'
  ssl protocol: 'TLS'
  Sensitive Props:
    key: '<any string, used to encrypt sensitive configuration values>'
    provider: BC

You have created a certificate for your host running MiNiFi, now you must authorize it on your NiFi server.
In the NiFi Web GUI:
  1. add a new user: open the user list <pic 10>, click on the <pic 9.> and as identity give the DN used with the TLS-toolkit, respecting the case and the spaces (CN=<name>, OU=NIFI).
  2. Close the list of users
  3. Open now the policies list
  4. Select the “Retrieve site-to-site details” policies and click <pic 9>
  5. In “User Identity” start typing a name present in the DN of your newly created user.
  6. Once the name of your host appears, select it and click “ADD”
  7. Close the list
  8. Click on the “RemoteMiNiFi” input port.
  9. Stop it if it is already running
  10. In the left menu window “Operate”, click on <pic 11>
  11. In the Access Policies, select “receive data via site-to-site”
  12. Click <pic 9> and select the host DN as in step 5. Click “ADD” once you have it selected.
With these steps complete, your Remote Processor is ready to receive flow files from the MiNiFi agent running on your website via an authenticated HTTPS connection.
Once the files keystore.jks and truststore.jks are at the places specified by the config.yml, you can launch the MiNiFi process.

Launching MiNiFi

Whatever the SSL-enabled of your NiFi server is, you will type this command to launch the MiNiFi agent on your web server:
websites(admin):/opt/minifi/toolkit$ sudo /etc/init.d/minifi start
You can watch the result / progress of MiNiFi by opening its log file situated under the logs directory in its installation place.

If you notice error about “remote port ID <xxx> not found”, have a look in the NiFi GUI for the Input Port ID (it will be shown in the Operate window at the left of the workspace). If it is different from what you have in your config.yml, you can manually change it
Verify if the MiNiFi service is configured to auto-start when your server reboot.