Hadoop: Difference between revisions

From Chorke Wiki
Jump to navigation Jump to search
Line 7: Line 7:


== Knowledge ==
== Knowledge ==
  sudo apt install ssh
{|
  sudo apt install pdsh
|valign="top" colspan="2"|
readlink -f /usr/bin/java | sed "s:bin/java::"
  sudo apt-get install pdsh
  sudo apt-get install ssh
 
|-
|colspan="2"|
----
|-
|valign="bottom"|
  sudo apt dist-upgrade
  sudo apt dist-upgrade
  sudo do-release-upgrade
  sudo do-release-upgrade
   
   
  sudo usermod -aG sudo hadoop
  sudo apt --fix-broken install
  readlink -f /usr/bin/java | sed "s:bin/java::"
  sudo apt install ubuntu-desktop
 
|valign="top"|
<source lang="ini">
[Service]
User=spark
Group=spark
Type=forking
SuccessExitStatus=143
</source>
 
|}


==References==
==References==

Revision as of 03:05, 19 October 2022

Hadoop is a Java-based programming framework that supports the processing and storage of extremely large datasets on a cluster of inexpensive machines. It was the first major open source project in the big data playing field and is sponsored by the Apache Software Foundation. Hadoop is comprised of four main layers:

  1. Hadoop Common is the collection of utilities and libraries that support other Hadoop modules.
  2. HDFS, which stands for Hadoop Distributed File System, is responsible for persisting data to disk.
  3. YARN, short for Yet Another Resource Negotiator, is the "operating system" for HDFS.
  4. MapReduce is the original processing model for Hadoop clusters. It distributes work within the cluster or map, then organizes and reduces the results from the nodes into a response to a query. Many other processing models are available for the 3.x version of Hadoop

Knowledge

readlink -f /usr/bin/java | sed "s:bin/java::"
sudo apt-get install pdsh
sudo apt-get install ssh

sudo apt dist-upgrade
sudo do-release-upgrade

sudo apt --fix-broken install
sudo apt install ubuntu-desktop
[Service]
User=spark
Group=spark
Type=forking
SuccessExitStatus=143

References