Skip to main content Link Menu Expand (external link) Document Search Copy Copied

Fluency Processiong Language Guide

Sections
  1. Fluency Processiong Language Guide
    1. Basics
    2. Search defines the Timeframe
      1. Assignment
      2. Aggregate-By
      3. Resulting Table
    3. Summary

Fluency Processing Language (FPL) - River Analytics (Ra) is a programming language designed to analyze and manipulate big data records.

function example()
    search {from="-8h"} sContent("@tags","fpl-example-data")
    let {id, isx5, isprime, odd, even, divisors} = f("@fields")
    aggregate v=values(id) by divisors
    let num_of_ints = listcount(v)
end

stream demo_table=example()

Basics

In teaching Ra, we will focus first on the search and analysis capabilities. This differs only slightly from a database query.

In a database query, there is a SELECT, FROM, WHERE and GROUP BY. In an analytic expression, this is a timeframe (SEARCH), WHERE, ASSIGN, AGGREGATE and BY. Just like in an SQL search, not all the key words have to appear.

A simple analytic expression can compose of just three steps:
    Timeframe
    Assignment of variables (columns)
    Aggregation

For example, this is a search for the last seven (7) days where we count the number of events for each source.

Timeframe: The ‘search’ command determines the range and source of the search. When no source is given, the source is the main event table by default.

Assignment: We then assign the variable ‘source’ to the field @source by using the field function (f).

Aggregation: An aggregation is a function of the dataset for the provided timeframe. This function is simply count(). The variable ‘total’

search { from="-7d<d", to=">d" }
assign source=f("@source")
aggregate total=count() by source

The output of an expression is one or more tables. And these tables are what generate a visualization.

The rest of this introduction will walk through this example and explain the wording or the language. A language’s wording is purposely chosen to allow the programmer to understand how the system will progress the code.

image placeholder

Search defines the Timeframe

The selection of data is based on an expression that searches the entire dataset. When doing data analytics, the primary selection is to first define the timeframe. In a data lake this is defined with a start and stop time, while in streaming data it is defined by a sliding window of time. Regardless, the first statement in an analytic expression is the timeframe.

search { from="-7d<d", to=">d" }

The first line in the program is the ‘search’ command. The first difference you might notice is that the search is using curly brackets {}, not parentheses (). When brackets are used, the options will appear as key-value parings and are intended for the use by the command. When parentheses are used, a list of assignments will appear. Parentheses are used to pass assignments from to another function or block.

There are two options that appear for the search command are from and to. This tells the process to select data starting from a point in time and ending in a point in time.

The nomenclature for time is logical:

      ◦ m: minutes
      ◦ h: hours
      ◦ d: Day
      ◦ w: Week
      ◦ Mn: month

The greater than and less than signs can be seen as arrows on an X-Axis. A less than sign > points to the end of the time period while a greater than < points to the beginning. So,

      ◦ >d: means the end of the day
      ◦ <d: the start of the day
      ◦ >w: end of the week
      ◦ <w: start of the week
      ◦ >m: end of the minute.

There is a relative adjustment that is normally placed in from of the ‘from’ statement. In this example, we want the from to be the beginning of the day, seven days prior. The seven days prior are:

      ◦ -7d<d

The expression ‘7d<d’ is not the same as ‘<w’. The later is not seven days prior, but the start of the week. In US time, that is Sunday 12am midnight. This makes more sense when you want to know the data for this month:

     ◦ { from=“<Mn”, to to=“>Mn” }

Assignment

The ‘assign’ statement assigns the expression to the right to the variable on the left. It assigns, because this variable name references the column and will be used as a handler. The value of the column is calculated only once.

assign source=f("@source")

In this example, the field called ‘@source’ is assigned the variable handler of source.

The f() function, refers to the field. This the most common function in the Ra Programming language. Notice that f() has parentheses. It is a function, not a command. It returns an object of that field. In this case, this is a String called @source.

It could have returned a JSON object. In which, the left side of the expression can have multiple assignments. We will cover that later.
What is important to understand is that an ‘assign’ is a mapping from the record to a variable handler that represents a column.

Aggregate-By

This is the main difference between a query expression and an analytic. An aggregate is a function that performs a calculation over the dataset. An assignment is a relationship of the record to a variable, while an aggregate is a calculation of the dataset by the set defined by the ‘by’ command. In this case, it is ‘by’ the ‘source’

aggregate total=count() by source

Think of this like the GROUP BY in an sql query. The dataset divided into groups that share the value in ‘by’. Then the function counts the number of records in the group. Other examples, could have been:

     ◦ unique(): count the number of unique values in the set.
     ◦ max(): provide the largest value in the set.
     ◦ min(): provide the smallest value in the set.
     ◦ value(): create a list of unique values in the set.

Resulting Table

Each process has at least one table output. The default number of rows is set to ten (10). This can be changed with the sort command:

search { from="-7d<d", to=">d" }
assign source=f("@source")

aggregate total=count() by source
sort 5 total

image placeholder

This will then generate a table and graph that has five (5) rows. The sort command is a value followed by the variable (column name).

Summary

Ra is a functional programming language that is designed to work on big data. The basic analytic expression is a timeframe, assignment and aggregate.

    • The timeframe is defined by a form and to value. These are relatively defined by time periods of minute, hour, day, week, and month. 
    • The assignment command defines the columns and labels them with the variable name.
    • The aggregate variables are the results of function for the dataset in the timeframe. 
    • A sort command determines the order of listing and number of maximum rows.