Spark-Redis是用Spark在redis上面进行读写数据操作的包。其支持redis的所有数据结构:String(字符串), Hash(哈希), List(列表), Set and Sorted Set(集合和有序集合)。此模块既可以用于Redis的standalone模式,也可用于集群情况。此外,Spark-Redis还提供了对Spark-Streaming的支持。
/**
* Read a directory of text files from HDFS, a local file system (available on all nodes), or any
* Hadoop-supported file system URI. Each file is read as a single record and returned in a
* key-value pair, where the key is the path of each file, the value is the content of each file.
*
* <p> For example, if you have the following files:
* {{{
* hdfs: [color=#009933]//a-hdfs-path/part- 00000
* hdfs: //a-hdfs-path/part- 00001
* ...
* hdfs: //a-hdfs-path/part-nnnnn
* }}}
*
* Do ` val rdd = sparkContext.wholeTextFile( "hdfs://a-hdfs-path")`,
*
* <p> then ` rdd` contains
* {{{
* (a-hdfs-path/part- 00000, its content)
* (a-hdfs-path/part- 00001, its content)
* ...
* (a-hdfs-path/part-nnnnn, its content)
* }}}
*
* @note Small files are preferred, large file is also allowable, but may cause bad performance.
* @note On some filesystems, ` .../path /*` can be a more efficient way to read all files
* in a directory rather than ` .../path/` or ` .../path`
*
* @param path Directory to the input data files, the path can be comma separated paths as the
* list of inputs.
* @param minPartitions A suggestion value of the minimal splitting number for input data.
*/[/color]
def wholeTextFiles(
path: String,
minPartitions: Int = defaultMinPartitions): RDD[(String, String)] = withScope {
assertNotStopped()
val job = NewHadoopJob.getInstance(hadoopConfiguration)
// Use setInputPaths so that wholeTextFiles aligns with hadoopFile/textFile in taking
// comma separated files as input. (see SPARK- 7155)
NewFileInputFormat.setInputPaths(job, path)
val updateConf = job.getConfiguration
new WholeTextFileRDD(
this,
classOf[WholeTextFileInputFormat],
classOf[Text],
classOf[Text],
updateConf,
minPartitions).map(record => (record._1.toString, record._2.toString)).setName(path)
}