This blog was originally published on https://vincentbrule.com/
Polynote is a polyglot notebook with first-class Scala support. Nowadays, multiple solutions exist to use Scala inside a Notebook. I will explain why I switched from Jupyter to Polynote for all my notebooks.
This blog post has two parts. We will start by explaining what is a Notebook and how you can use it in your everyday developer life. Then, we will discover Polynote, a promising Open source notebook solution with first-class Scala support.
Notebook and its use cases
A notebook is composed of two parts:
- A server running with a kernel corresponding to your language. For example, Almond is a Scala kernel for Jupyter
- A web interface to write your code or text (in Markdown in general)
You can write any language in your web interface as long as it is supported by the kernel running on the server side. Here is a list of all the different kernels for Jupyter notebook. I am pretty sure you can find your favorite language in this list and start using it in a notebook! Once your code is ready, you can evaluate the cell. The code will be sent to the server and evaluated. After that, the server will return the response to the web interface. Finally, the web interface will wrap the result in some HTML to display the result nicely. That’s the general concept. Each solution adds its own features to provide the best end-user experience.
My use cases
I have started to use notebooks (with Jupyter), during my university classes, for practical exercises. It was a convenient way for our teachers to prepare the lessons. Indeed, a teacher can put the code, the questions, and some explanations in the same file. Then, as a student, all you have to do is to install the notebook solution and start working on the teacher’s file. In my school, we used Jupyter. You can install it with a single
pip command and you are ready to start. I used the same concept to organize workshops recently as you can see with the workshop Introduction to Tensorflow in Scala. I made a
Dockerfile with everything needed to start the workshop.
It was a great experience and I kept in mind that it can be useful during my everyday life developer. During my graduation internship, I worked on a research project with my supervisor. The difficulty was to work and synchronize together because we were not working in the same city. To solve this problem, we used notebooks extensively to interact with each other on the code. In this case, a notebook brings various advantages. For example, you can:
- run the cells and save the results. For example, with Machine Learning, a cell can take a lot of time to run, so it was an efficient way to share our findings without having to run everything over again
- explain each cell using Markdown, images and Latex
- simply share one file and it contains everything you need to run it
- experiment a solution with multiple languages by simply adding kernels on the server side
Notebooks are popular for these reasons and many more. There are many ways to start using a notebook in your machine. In the next part, I will show you why I switched completely to Polynote and why you should try it!
In this part, I will explain the two main advantages of Polynote for me. Then, I will show you other useful features of this solution.
The main advantage of this solution is to be polyglot. For the moment, Polynote supports 5 languages:
You can use these five languages in the same notebook.
As you can see in figure 3, we can easily share Scala data into Python and vice versa. You can find the entire notebook in GitHub. In this example, we receive a list of weather’s data from the OpenWeather API. The Scala variable
datas can be used transparently in Python. To make these interactions possible, you have some restrictions (for example it is easier to use case class). Once again, I advise you to take a look at the notebooks at the same time since they contain more explanations. In addition, you have great interactions between Spark DataFrame and Python Pandas DataFrame.
Polynote wraps the results in HTML and adds some additional visualization features for specific types. Spark DataFrame and Pandas DataFrame have a many options for that.
For example, in the output of a Spark DataFrame or Pandas Dataframe, you have a summary of your DataFrame and two buttons with additional options (the icons with a blue circle in figure 4). These buttons will open the popup that you can see in figure 5.
In my opinion, the most interesting options are View data and Plot data. In the first one, you can display all your data in a stylesheet style. In the second one, you can easily plot your data by simply selecting your axes and the type of plot as you can see in figure 6. It will generate the corresponding block of Vega code for you.
Vega is a declarative language that allows you to create a lot of different designs as you can see in their examples. If you do not want to use Vega, you can add any plotting libraries such as Matplotlib in Python. But I advise you to try Vega and their examples because you can make powerful and fancy plots to identify edge cases in your data (figure 7). Moreover, Vega works out of the box with Polynote.
Last thing I want to talk about in this section is the WYSIWYG editor (figure 8).
It looks like a small feature but it is useful when you need to style your Markdown snippet and you do not know much about Markdown syntax.
In this menu, you can also open the Latex editor (figure 9) to write your formulas in an interactive way.
All these features allow you to make your notebooks understandable and maintainable over time.
Polyglot and a lot of visualization features made me prefer Polynote compared to other existing solutions. In addition, Polynote brings other improvements that I want to share with you in the following part.
a. Order is important
With Jupyter, all cells work with the same global state. If you work with a big notebook, you can quickly mess up with the order of your logic (figure 10). If you want your Jupyter notebooks to be organized and maintainable, you have to manage everything yourself and be very rigorous if you work with other people on the same notebook.
Polynote does not use a global state. Each cell has its state defined by all the cells above. As indicated in the documentation:
“This is a powerful way to enforce reproducibility in a notebook; it is far more likely that you’ll be able to re-run the notebook from top to bottom if later cells can’t affect earlier cells.”
The symbol table will summarize all variables defined in the current state. As you can see in figure 11, at the beginning of your notebook, the symbol table is either empty or contains the Spark Session if you have enabled support for Spark in this notebook.
If you run all the cells of this notebook, you will have the same symbol table than in the figure 12.
You can see the name of the variable and its type. In addition, you can click on a variable to visualize your data with Polynote’s tools. Python types are wrapped with
If we try the same experiment as we did with Jupyter (figure 10), we get not found error as expected (figure 13).
b. Highlighting running code
Polynote will highlight the current running block until it is completed (figure 14). It is a small feature but it was handy when I used Tensorflow with Polynote to quickly detect parts that were taking a long time.
c. Code editing
With the previous solution like Jupyter, I used to have an IDE open for new libraries because you do not have code editing at all. Polynote implements code editing capabilities to facilitate your development, such as autocomplete feature (figure 15).
d. Organization of the dependencies
The last thing I want to talk about is how the dependencies are organized inside a Polynote’s notebook. Everything is at the top of the notebook in the part Configuration & dependencies (figure 16).
It is a powerful feature to organize our notebooks. With Jupyter, you have to define your dependencies in a cell as you do with your code, so everything can get mixed up (code + dependencies) and become confusing if you are not rigorous enough. With Almond and Jupyter, you have to know how to use Coursier while Polynote takes care of everything for you.
If you use a dependency in all your notebooks, you can define it in the configuration file of Polynote to have it automatically in each new notebook created.
Polynote brings a lot of useful features that will make your use of notebooks with Scala easy and pleasant compared to Jupyter. I really appreciate the organization of notebooks with Polynote and this is why I have switched to this solution. Indeed, an organized notebook is better to collaborate with others.
Finally, this project is Open source, so feel free to contribute if you like the project!
Thanks for reading and I hope you will want to try Polynote! Feel free to contact me if you have any question about this blog post or the example notebooks.