Apache Zeppelin Vs Jupyter Notebook: A Data Analytics Comparison

Data Analytics tools provide Data Scientists and Data Engineers with the instruments to find patterns in data and provide business insights to Executives and other stakeholders. Data Analytics Notebooks are programmes that fast track this process as Data Scientists can very quickly develop applications to analyse the data, visualise the results and roll the successful patterns out into production. This shortens the pattern analysis cycle and reduces the hit and miss common in the development of analytics models.

The Jupyter Notebook is a well-known software application that has been around for a while. It has been used by some big companies like Google and NASA. It’s best suited for working with data that can fit into memory. It was developed in 2012, an evolution of the iPython Notebook, which only supported Python language as a notebook engine. Jupyter is more flexible and supports a number of programming languages such as Python, Scala and R. Since it’s open-source, it has a big community and a lot of additional software and integrations.

On the other hand, Apache Zeppelin was developed in 2013 by Apache Foundation as part of the Hadoop landscape. While it is also open-source, its community is only a small fraction of Jupyter’s. It’s more suited for data that is distributed across a Hadoop cluster.  One of Zeppelins strengths is the creation of dashboards and multi-user sharing. Another advantage of Zeppelin is that it is part of the Hadoop landscape and integrates well with other Hadoop applications such as Spark, Pig, Hive and others.

Appearance

Both UIs are similar, but Zeppelin’s biggest advantage is that it allows the combination of multiple paragraphs into one line. It also has a built-in simple data visualisation tool for some interpreters and a table output that allows sorting out-of-the-box.

Jupyter’s code editor and paragraph editor seem to be much more effective though, with more hot keys and a great auto-completion feature. Another plus of this software lies in the large number of Python libraries for visualizing data that support output of pictures and other interactive content directly in paragraph outputs.

Multi-User Support & Extensions

Jupyter doesn’t support multi-user configuration by default, but by installing Jupyterhub, an additional service can be added that accepts client connections, authenticates them, and starts a separate Jupyter server. This is not a great solution if a large number of users need to be accommodated because it may result in loading overheads on the physical server.

Zeppelin supports multi-user configuration via LDAP/Active Directory connectivity and specifically defined security groups. It uses only one server process, authenticating users in the configured system before allowing further access. Zeppelin provides a number of interpreters that provide library functions. The most common are Spark and Scala that open the tool to a large number of specialised match and statistics libraries. The same is true for Jupyter that has more extensions available to use because of its huge community and its stand alone nature.

In conclusion, Zeppelin is the better tool to use if the data scientist develops in the Hadoop world. It provides good integration with other Hadoop systems such as Spark, Pig and others and streamlines the development for Spark applications. It provides a better integration of larger teams, but it seems more geared towards enterprise users, having great LDAP integration, permissions management, and so on.

Jupyter requires less overhead in the setup and productionisation of developed patterns due to the stand alone nature. Due to the large number of extensions and integrations, specifically into Machine Learning and AI frameworks it has developed into the more popular choice among analytics users.

Discuss Jupyter and Zeppelin comparison or the pros and cons of each platform further with Fusion Professionals.

Fusion Insights

Data Analytics tools provide Data Scientists and Data Engineers with the instruments to find patterns in data and provide business…

MORE INFORMATION

Fusion Professionals held its Fusion Meld 2018 event last Thursday, the 17th of May at the Terrace Hotel in North…

MORE INFORMATION

Over the past 10 years, data has grown into a behemoth that dominates business intelligence. A huge percentage of the…

MORE INFORMATION

A data lab is a well-equipped environment that allows organisations to explore and examine new ideas by combining new data…

MORE INFORMATION

Technology is changing our world at a rapid rate. For companies to stay competitive they must adapt to the changing…

MORE INFORMATION

Another 12 months passes and the AWS Summit rolls into town, now hosted at the new impressive Sydney ICC!   So what…

MORE INFORMATION

DevOps has well and truly arrived. Having a team combining development and operations is superseding the traditional model where these…

MORE INFORMATION

We may not know it, but we’re consuming huge amounts of data every day. Whether it’s through Siri, Google, Microsoft,…

MORE INFORMATION

Fusion Professionals, a Sydney-based IT consulting firm and a Gold level member of the Oracle Partner Network (OPN) today announced…

MORE INFORMATION

This week Fusion Professionals held their annual summit at the Rag & Famish hotel in North Sydney. The summit is…

MORE INFORMATION

Working with a large, well-resourced enterprise has many benefits –  including having the scope to look at new ways of…

MORE INFORMATION

Who doesn’t like some certainty in their lives?! It’s human nature to crave it. So, when we are lining up…

MORE INFORMATION

Building a business case is a familiar and routine process for any IT investment project. But is there a possibility…

MORE INFORMATION

Fusion Professionals and Australian research and advisory firm BigInsights, recently hosted an industry breakfast to discuss how organisations can capitalise…

MORE INFORMATION

Are you considering taking advantage of the Amazon Web Services (AWS) platform but are concerned about the risks involved in…

MORE INFORMATION

Fusion Professionals is a proud sponsor of the Nikola Tesla – Unlimited Mind Exhibition in Sydney.  For the very first…

MORE INFORMATION

Large Australian Airline with over 30,000 employees and more than 6000 daily flights. OBJECTIVES In the Airline industry booking data…

MORE INFORMATION

It’s been a while since Oracle held a business event in Sydney but the recent Oracle Modern Business Experience (MBX)…

MORE INFORMATION

The Client is a Government electricity distributor, providing the safe and reliable supply of electricity to 2.4 million people in…

MORE INFORMATION

The “WannaCry” ransomware campaign has targeted a number of organisations internationally including the UK’s National Health Service and Spanish telecommunications…

MORE INFORMATION