|Tool Name||Hadoop Distributed File System (HDFS)|
|Tool Web Site||http://hadoop.apache.org/docs/r1.2.1/hdfs_user_guide.html|
|Supported Methodology||[Flat File] Multi-Model, Data Store (Physical Data Model) via Java API|
Import tool: Apache Hadoop Distributed File System (HDFS) 2.5 (http://hadoop.apache.org/docs/r1.2.1/hdfs_user_guide.html)
Import interface: [Flat File] Multi-Model, Data Store (Physical Data Model) via Java API from Apache Hadoop Distributed File System (HDFS Java API)
Import bridge: 'ApacheHDFS' 10.0.0
The bridge uses Apache Hadoop HDFS Java library (JARs) to access Hadoop file system.
The library JAR files are located in the /java/Hadoop directory.
One may specify a Configuration files directory and often that is sufficient, as the values for the other bridge parameters may be specified there.
Bridge supports following metadata formats:
- cobol copybook
Bridge supports following compressed formats:
- ZIP (as a compression format, not as archive format)
- Snappy (as standard Snappy format, not as Hadoop native Snappy format)
|Configuration files directory|| Directory containing core-site.xml and hdfs-site.xml for your environment.
It is an optional parameter that allows you to reuse configuration files you have and avoid specifying Hadoop connection and Kerberos security details manually using other parameters.
When you would like to specify the details manually you should leave this parameter value empty. If you specify the directory value and it does not have the configuration files the bridge exits with the error.
You can override the parameters available in the configuration files using the bridge parameters.
For example, you can override the fs.default.name file parameter using the NameNode URI bridge parameter.
|NameNode URI||URI of the Hadoop NameNode, like hdfs://host::8020
|Root directory||Enter the directory containing metadata files or specify it using browsing tool. Bridge provides up to 3 level browsing depth.||REPOSITORY_MODEL|
|Include filter||Relative to the root path, case sensitive filter of files expressed using the extended unix glob expression pattern syntax (e.g. '/*.csv' - import files that end with .csv in root folder; **.csv - import all .csv files in any directory level).
|Exclude filter||Similar to the include filter parameter||STRING|
|Partition directories||Files-based partition directories' paths.
The bridge tries to detect partitions automatically. It can take a long time when partitions have a lot of files.
You can shortcut the detection process for a partition by specifying it in this parameter.
Specify the partition directory path relative to the Root directory.
Use . to specify the root directory as the partitioned directory.
Separate multiple paths with the , or ; characters.
For example: dir1/dir2 dir3/dir4 dir5
|Sample size||Enter the directory containing metadata files or specify it using browsing tool. Bridge provides up to 3 level browsing depth.||NUMERIC|
|Hadoop properties|| Custom Hadoop and HDFS configuration properties.
The bridge uses a default configuration to access a Hadoop distribution. If you need to use a custom configuration, specify its parameter values here.
For further information about the properties required by Hadoop and its related systems such as HDFS and Hive, see the documentation of the Hadoop distribution you are using or see Apache's Hadoop documentation on http://hadoop.apache.org/docs and then select the version of the documentation you want. For demonstration purposes, the links to some properties are listed below:
Typically, the HDFS-related properties can be found in the hdfs-default.xml file of your distribution, such as
|Keytab file||Full path to the Kerberos keytab file. The file is necessary to log into a Kerberos-enabled Hadoop system. It contains pairs of Kerberos principals and encrypted keys. You need to enter the Principal using the Principal user parameter.
The user that runs the bridge is not necessarily the one the Principal designates but must have the right to read the keytab file being used. For example, the user name you are using to run the bridge is UserA and the principal to be used is UserB; in this situation, ensure that UserA has the right to read the keytab file to be used.
|Principal||User principal name. See the “Keytab file” parameter documentation for details.||STRING|
|Username||User authentication name of HDFS. Sometimes referred to as proxy name.
The parameter is only used for Kerberos authentication.
It does not impact the user which runs the bridge.
|HDFS encryption key provider (KMS)|| The location of the KMS proxy. For example, kms://http@localhost:16000/kms.
Specify the HDFS encryption key provider only when the HDFS transparent encryption has been enabled in your cluster. Leave the value empty otherwise.
For further information about the HDFS transparent encryption and its KMS proxy, see Transparent Encryption in HDFS at https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/TransparentEncryption.html.
|Miscellaneous||Specify miscellaneous options identified with a -letter and value.
For example, -m 4G -f 100 -j -Dname=value -Xms1G
-m the maximum Java memory size whole number (e.g. -m 4G or -m 2500M ).
-v set environment variable(s) (e.g. -v var1=value -v var2="value with spaces").
-j the last option that is followed by Java command line options (e.g. -j -Dname=value -Xms1G).
-hadoop key1=val1;key2=val2 to manualy set hadoop configuration options
-tps 10 maximum threads pool size
Mapping information is not available