.
macs.hw.ac.uk Comparing Fork/Join and MapReduce scalability
zetta.net
mapr.com
wintelguy.com
Mean Time to Data Loss (MTTDL). The Inputs of the MTTDL Model:
  • The number of hard drives (data set size/system performance)
  • The reliability of each hard drive
  • The probability of reading a given hard drive correctly without error
  • The redundancy encoding of the system
  • The rebuild rate
Data Loss (Drive failures) may not be very probable but,
Data Corruption shockingly has caused most outages
MTTDL calculator for RAID
scalability
vtagion.com Scale Up (Vertical) and Scale Out (Horizontal) scaling scalability
apache.org HDFS Architecture hdfsa hdfsdn hdfs
matthewrathbone.com HDFS Cheat Sheet by Matthew Rathbone hdfs
kdnuggets.com Data Lake vs Data Warehouse dlwd hdfs
databricks.com A data lakehouse combines the flexibility, cost-efficiency, and scale of data lakes
Data structures/data management (DM)/and ACID transactions features from data warehousedata-lakehouse
hdfs
berkeley.edu The Berkeley Data Analytics Stack (BDAS) bdas big-data
matthewrathbone.com Many map-reduce examples map-reduce
platfora.com Handling Skew in a Map-Reduce map-reduce
github.com The Hadoop Ecosystem hadoop
apache.org
tutorialspoint.com
Hadoop setup (Update Java home @ etc/hadoop/hadoop-env.sh)
NameNode and DataNode start/stop: sbin/start-dfs.sh and sbin/stop-dfs.sh
Yarn start/stop: sbin/start-yarn.sh and sbin/stop-yarn.sh
local hadoop resource manager url & local hadoop file manager url
hadoop
acadgild.com
saphanatutorial.com
How Yarn helped hadoop
stackoverflow.com
stackoverflow.com
Hadoop connection refused issue hadoop
stackoverflow.com How does partitioning work for data from files on HDFS hadoop
apache.org
datametica.com
Hadoop commands hadoop
stackoverflow.com
Reason of _SUCCESS and part-r-00000 hadoop-output hadoop
apache.org Apache Parquet (columnar storage format) hadoop
wordpress.com
berkeley.edu
Spark white paper and resources
My dedicated Spark page
spark
hive.org Apache Hive resources
My dedicated Hive page
hive
alvinalexander.com
stackoverflow.com
Akka example
Spark is not using Akka anymore
Akka uses Fiber; which is slightly different from Thread
actor
paralleluniverse.co
zeroturnaround.com
Fiber based Actor framework: Quasar thread fiber
tachyon-project.org
google.com
berkeley.edu
Tachyon as storage tachyon
github.com Tachyon build scratch-pad tachyon
reactivemanifesto.org
reactive-streams.org
Reactive menifesto and Reactive Streams reactive
peterjames.com The big data and analytics dictionary big data
github.com A curated list of awesome big data frameworks and resources big data
google.com Overview of Cloud Bigtable GCP
github.com A curated list of machine learning frameworks, libraries and software machine-learning
cloudera.com cloudera
informit.com Big Data Adoption and Planning Considerations reactive
github.com GitHub Blockchain references blockchain