admin管理员组

文章数量:1533918

文章目录

    • 一、安装datax环境
    • 二、使用datatx
    • 三、安装datax_web
    • 四、datax_web的使用

一、安装datax环境

  1. 安装python2.7.18(配置环境变量):https://www.python/downloads/release/python-2718/
  2. 安装jdk8(配置环境变量)
  3. 安装 maven 3.x(配置环境变量)
  4. datax.tar.gz包下载并解压:http://datax-opensource.oss-cn-hangzhou.aliyuncs/datax.tar.gz
  5. datax源码下载:https://github/alibaba/DataX。(下载源码主要是因为某些源码需要自己修改,然后重新打包覆盖导datax,如下述的mysql版本不一致例子)
    ps1:自动下载依赖,如果hitsdb-client报错,则将hitsdb-client依赖版本改为0.3.7即可
    ps2:因为我的项目用的是mysql8,所以需要将mysqlReadermysqlWriter服务的mysql的依赖改为8.0.17,默认是5.x
    ps3:datax-all服务执行clean-install
    ps4:将mysqlReadermysqlWriter生成的target–>datax–>plugins复制到dataxbin同级的目录下,覆盖即可
    ps5:将“datax\plugin\reader\mysqlreader\libs”、“datax\plugin\writer\mysqlwriter\libs”目录下的mysql5的包删除

二、使用datatx

  1. 打开cmd界面
  2. 进入dataxbin目录下
  3. 输入CHCP 65001 防止中文乱码
  4. 查看模板命令行: python datax.py -r streamreader -w streamwriter
  5. 执行脚本:python datax.py E:\datax\datax\datax\job\job.json
    ps1:报错:提供的配置文件[E:\datax\datax\datax\plugin\reader._cassandrareader\plugin.json]不存在. 请检查您的配置文件.
    ​ —>解决:删除datax\plugin\writerdatax\plugin\reader底下"._"开头文件
  6. 编写脚本,将脚本存放到datax\datax\job目录底下,然后执行后python datax.py datax\job\你的脚本.json运行
    ps1:脚本格式信息可以去https://github/alibaba/DataX/查看

    ps1:脚本基础参数说明(mysql、oracle)
{
    "job": {
        "setting": {
            "speed": {
                "channel": 并发数
            }
        },
        "content": [
            {
                 "reader": {
                    "name": "固定的名字",
                    "parameter": {
                        "username": "账号",
                        "password": "密码",
                        "column": [
                            "字段1",
                            "字段2",
							"字段3"
							ps:[*]表示所有列,但是不建议使用
                        ],
                        "splitPk": "分片字段,丢给task,所以必须是整形。一般用主键即可(非必须)",
                        "connection": [
                            {
                                "table": [ "表"],
                                "jdbcUrl": [  "数据库连接地址"],
								"querySql":["这里可以写查询数据的sql语句,但是如果这里写了,则不允许再配置table,否则会报错(非必须,table和querySql只能存在一个)"]
                            }
                        ],
						"where":"过滤条件"
                    }
                },
                "writer": {
                    "name": "固定的名字",
                    "parameter": {
                        "writeMode": "写入策略",
                        "username": "账户",
                        "password": "密码",
                        "column": [
                            "字段1",
                            "字段2",
							"字段3"
                        ],
                        "session": [
                        	"DataX在获取Mysql连接时,执行session指定的SQL语句,修改当前connection session属性(非必须)"
                        ],
                        "preSql": [
                            "写入数据到目的表前,会先执行这里的标准语句(非必须)"
                        ],
                        "connection": [
                            {
                                "jdbcUrl": "数据库连接地址",
                                "table": [
                                    "表"
                                ]
                            }
                        ]
                    }
                }
            }
        ]
    }
}

ps3:将mysql的datax数据过滤后迁移到oracle实例

{
    "job": {
        "setting": {
            "speed": {
                "channel": 1
            }
        },
        "content": [
            {
                 "reader": {
                    "name": "mysqlreader",
                    "parameter": {
                        "username": "root",
                        "password": "tiger",
                        "column": [
                            "id",
                            "name",
							"time"
                        ],
                        "splitPk": "id",
                        "connection": [
                            {
                                "table": [
                                    "datax"
                                ],
                                "jdbcUrl": [
     "jdbc:mysql://127.0.0.1:3306/datax?useUnicode=true&characterEncoding=utf-8&allowMultiQueries=true&useSSL=false&serverTimezone=GMT%2b8"
                                ]
                            }
                        ],
						"where":"id < 10"
                    }
                },
                 "writer": {
                    "name": "oraclewriter",
                    "parameter": {
                        "username": "HYDROPOWER_JIANGXI",
                        "password": "ffcsict123",
                        "column": [
                            "id",
                            "name",
							"time"
                        ],
                        "preSql": [
                            "delete from datax"
                        ],
                        "connection": [
                            {
                                "jdbcUrl": "jdbc:oracle:thin:@192.168.35.9:1521:orcl",
                                "table": [
                                    "datax"
                                ]
                            }
                        ]
                    }
                }
            }
        ]
    }
}
  1. 达梦数据库:因为数据库介绍没有dm数据库,所以只能使用“通用RDBMS(支持所有关系型数据库)”这种模式进行传输。
{
    "job": {
        "setting": {
            "speed": {
                "channel": 1
            }
        },
        "content": [
            {
                 "reader": {
                    "name": "mysqlreader",
                    "parameter": {
                        "username": "root",
                        "password": "tiger",
                        "column": [
                            "id",
                            "name",
							"time"
                        ],
                        "splitPk": "id",
                        "connection": [
                            {
                                "table": [
                                    "datax"
                                ],
                                "jdbcUrl": [
     "jdbc:mysql://127.0.0.1:3306/datax?useUnicode=true&characterEncoding=utf-8&allowMultiQueries=true&useSSL=false&serverTimezone=GMT%2b8"
                                ]
                            }
                        ],
						"where":"id < 10"
                    }
                },
                 "writer": {
                    "name": "rdbmswriter",
                    "parameter": {
                        "username": "AES",
                        "password": "ffcsict123",
                        "column": [
                            "id",
                            "name",
							"time"
                        ],
                        "preSql": [
                            "delete from datax"
                        ],
                        "connection": [
                            {
                                "jdbcUrl": "jdbc:dm://127.0.0.1:5236/AES?zeroDateTimeBehavior=converToNull&useUnicode=true&characterEncoding=utf-8",
                                "table": [
                                    "datax"
                                ]
                            }
                        ]
                    }
                }
            }
        ]
    }
}
  1. 文件流:只能读取规则的二维表结构文件,并且只能读取本地文件及远程服务器的文件,只能调用api接口获取流
{
    "setting": {},
    "job": {
        "setting": {
            "speed": {
                "channel": 2
            }
        },
        "content": [
            {
                "reader": {
                    "name": "ftpreader",
                    "parameter": {
                        "protocol": "sftp",
                        "host": "192.168.248.11",
                        "port": 22,
                        "username": "root",
                        "password": "ffcsict123",
                        "path": [
                            "/root/datax/datax.txt","/root/datax/datax.csv"
                        ],
                        "column": [
                            {
                                "index": 0,
                                "type": "string"
                            },
                            {
                                "index": 1,
                                "type": "string"
                            },
                            {
                                "index": 2,
                                "type": "string",
                            
                            }
                        ],
                        "encoding": "GBK",
                        "fieldDelimiter": ",",
						"skipHeader": "true"
                    }
                },
                "writer": {
                    "name": "mysqlwriter",
                    "parameter": {
                        "writeMode": "insert",
                        "username": "root",
                        "password": "tiger",
                        "column": [
                            "id",
                            "name",
							"time"
                        ],
                        "session": [
                        	"set session sql_mode='ANSI'"
                        ],
                        "preSql": [
                            "delete from datax1"
                        ],
                        "connection": [
                            {
                                "jdbcUrl": "jdbc:mysql://127.0.0.1:3306/datax?useUnicode=true&characterEncoding=utf-8&allowMultiQueries=true&useSSL=false&serverTimezone=GMT%2b8",
                                "table": [
                                    "datax1"
                                ]
                            }
                        ]
                    }
                }
            }
        ]
    }
}

三、安装datax_web

  1. datax_web下载源码:https://github/WeiYe-Jing/datax-web
    ps1:创建数据库:脚本文件路径–>datax-web\bin\db
    ps2:修改配置文件:修改datax-admindatax-executor下的application.yml,将变量路径与端口都根据注释内容放开,数据库和目录根据电脑情况自行修改。
    ps3:数据库路径后面记得加上“&allowPublicKeyRetrieval=true
    ps4:mail部分直接放开xx即可
  2. hadoop2.7.4下载:https://github/vhma/winutils
    ps1:下载完,解压,需要配置环境变量。HADOOP_HOME:地址。path%HADOOP_HOME%/bin
  3. 启动datax-admindatax-executor服务,访问地址:http://localhost:8080/index.html,账号密码:admin/123456

四、datax_web的使用

  1. 创建项目

  2. 执行器管理中新建执行器(原来库里已有默认的,可直接使用)

  3. 新建数据源

  4. 任务管理中Datax任务模板创建

  5. 任务构建

    选对应的目标库与目标库,字段对应上即可。可以在执行前编写前置sql,比如清除表数据操作等

    勾选对应的标和字段,按顺序匹配

    然后构建,点击生成模板,下一步即可

  6. 任务管理模块就可以看到生成的任务管理了。可以点击启动,或者执行一次,然后查看日志看看是否成功


    如果报错的话,可以查看日志以及检查相关的脚本信息

本文标签: 入门版本dataXdataxwebWindows