Big Data Distributed file systems

Big Data Distributed file systems and difference between hadoop fs and hdfs dfs and distributed file system supported by hadoop
Dr.GordenMorse Profile Pic
Dr.GordenMorse,France,Professional
Published Date:22-07-2017
Your Website URL(Optional)
Comment
Ghislain Fourny Big Data 4. Distributed file systems Kheng Ho Toh / 123RF Stock PhotoSo far... We've rehearsed relational databases 2So far... We've looked into scaling out 3So far... We've seen a simple model for object storage 4Why? There is Big Data Anna Liebiedieva / 123RF Stock Photo and Big Data Vadym Kurgak / 123RF Stock Photo 5Use cases A huge amount of large files? vs. A large amount of huge files? 6Use cases Billions of TB files Object Storage vs. File Storage Millions of PB files 7Where does the data come from? Sensors Aggregated data Measurements Intermediate data Events Logs Anton Starikov / 123RF Stock Photo Oleg Dudko / 123RF Stock Photo Derived Data Raw Data 8Technologies and models Key-Value Store File System Object Storage Block Storage Billions of Millions of vs. TB files PB files 9Distributed file systems: inception FS 10GFS genesis Characteristics Requirements File System Design 11Fault tolerance and robustness Local disk It might fail Vitaly Korovin / 123RF Stock Photo Cluster with 100s to10,000s of machines nodes will fail Kheng Ho Toh / 123RF Stock Photo 12Fault tolerance and robustness Fault tolerance Automatic Recovery Error detection Monitoring 13 Kheng Ho Toh / 123RF Stock PhotoFile update model vs. Upsert/append only Random access 14File update model suitable for immutable Sensors _____ _____ Logs _____ Intermediate data Append 15Appends 100s of clients in parallel atomic Append only GFS only 16Performance requirements Top priority: Throughput Secondary: ? Latency 17The progress made (1956-2010): Logarithmic 622,100,131x 11,719x 8x Throughput Capacity Latency 18 Source: Michael E. Friske, Claus Mikkelsen, The History of Storage, SHARE 2014 Picture: Ash Waechter/123RFThe progress made (1956-2010): Logarithmic 622,100,131x 11,719x Parallelize 8x Throughput Capacity Latency 19 Source: Michael E. Friske, Claus Mikkelsen, The History of Storage, SHARE 2014 Picture: Ash Waechter/123RFThe progress made (1956-2010): Logarithmic 622,100,131x 11,719x 8x Batch processing Throughput Capacity Latency 20 Source: Michael E. Friske, Claus Mikkelsen, The History of Storage, SHARE 2014 Picture: Ash Waechter/123RF