使用kylin的示例分析,相信很多没有经验的人对此束手无策,为此本文总结了问题出现的原因和解决方法,通过这篇文章希望你能解决这个问题。
我的kylin.properties配置:
### SERVICE ### # Kylin server mode, valid value [all, query, job] kyin.server.mode=all # Optional information for the owner of kylin platform, it can be your team's email # Currently it will be attached to each kylin's htable attribute [email protected] # List of web servers in use, this enables one web server instance to sync up with other servers. kylin.rest.servers=192.168.64.16:7070 # Display timezone on UI,format like[GMT+N or GMT-N] kylin.rest.timezone=GMT+8 ### SOURCE ### # Hive client, valid value [cli, beeline] kylin.hive.client=cli # Parameters for beeline client, only necessary if hive client is beeline #kylin.hive.beeline.params=-n root --hiveconf hive.security.authorization.sqlstd.confwhitelist.append='mapreduce.job.*|dfs.*' -u 'jdbc:hive2://localhost:10000' kylin.hive.keep.flat.table=false ### STORAGE ### # The metadata store in hbase kylin.metadata.url=kylin_metadata@hbase # The storage for final cube file in hbase kylin.storage.url=hbase # In seconds (2 days) kylin.storage.cleanup.time.threshold=172800000 # Working folder in HDFS, make sure user has the right access to the hdfs directory kylin.hdfs.working.dir=/kylin # Compression codec for htable, valid value [none, snappy, lzo, gzip, lz4] kylin.hbase.default.compression.codec=none # HBase Cluster FileSystem, which serving hbase, format as hdfs://hbase-cluster:8020 # Leave empty if hbase running on same cluster with hive and mapreduce kylin.hbase.cluster.fs=hdfs://master1:8020 # The cut size for hbase region, in GB. kylin.hbase.region.cut=5 # The hfile size of GB, smaller hfile leading to the converting hfile MR has more reducers and be faster. # Set 0 to disable this optimization. kylin.hbase.hfile.size.gb=2 kylin.hbase.region.count.min=1 kylin.hbase.region.count.max=500 ### JOB ### # max job retry on error, default 0: no retry kylin.job.retry=0 kylin.job.jar=$KYLIN_HOME/lib/kylin-job-1.5.4.jar kylin.coprocessor.local.jar=$KYLIN_HOME /lib/kylin-coprocessor-1.5.4.jar # If true, job engine will not assume that hadoop CLI reside on the same server as it self # you will have to specify kylin.job.remote.cli.hostname, kylin.job.remote.cli.username and kylin.job.remote.cli.password # It should not be set to "true" unless you're NOT running Kylin.sh on a hadoop client machine # (Thus kylin instance has to ssh to another real hadoop client machine to execute hbase,hive,hadoop commands) kylin.job.run.as.remote.cmd=false # Only necessary when kylin.job.run.as.remote.cmd=true kylin.job.remote.cli.hostname= kylin.job.remote.cli.port=22 # Only necessary when kylin.job.run.as.remote.cmd=true kylin.job.remote.cli.username= # Only necessary when kylin.job.run.as.remote.cmd=true kylin.job.remote.cli.password= # Used by test cases to prepare synthetic data for sample cube kylin.job.remote.cli.working.dir=/tmp/kylin # Max count of concurrent jobs running kylin.job.concurrent.max.limit=10 # Time interval to check hadoop job status kylin.job.yarn.app.rest.check.interval.seconds=10 # Hive database name for putting the intermediate flat tables kylin.job.hive.database.for.intermediatetable=default # The percentage of the sampling, default 100% kylin.job.cubing.inmem.sampling.percent=100 # Whether get job status from resource manager with kerberos authentication kylin.job.status.with.kerberos=false kylin.job.mapreduce.default.reduce.input.mb=500 kylin.job.mapreduce.max.reducer.number=500 kylin.job.mapreduce.mapper.input.rows=1000000 kylin.job.step.timeout=7200 ### CUBE ### # 'auto', 'inmem', 'layer' or 'random' for testing kylin.cube.algorithm=auto kylin.cube.algorithm.auto.threshold=8 kylin.cube.aggrgroup.max.combination=4096 kylin.dictionary.max.cardinality=5000000 kylin.table.snapshot.max_mb=300 ### QUERY ### kylin.query.scan.threshold=10000000 # 3G kylin.query.mem.budget=3221225472 kylin.query.coprocessor.mem.gb=3 # Enable/disable ACL check for cube query kylin.query.security.enabled=true kylin.query.cache.enabled=true ### SECURITY ### # Spring security profile, options: testing, ldap, saml # with "testing" profile, user can use pre-defined name/pwd like KYLIN/ADMIN to login kylin.security.profile=testing ### SECURITY ### # Default roles and admin roles in LDAP, for ldap and saml acl.defaultRole=ROLE_ANALYST,ROLE_MODELER acl.adminRole=ROLE_ADMIN # LDAP authentication configuration ldap.server=ldap://ldap_server:389 ldap.username= ldap.password= # LDAP user account directory; ldap.user.searchBase= ldap.user.searchPattern= ldap.user.groupSearchBase= # LDAP service account directory ldap.service.searchBase= ldap.service.searchPattern= ldap.service.groupSearchBase= ## SAML configurations for SSO # SAML IDP metadata file location saml.metadata.file=classpath:sso_metadata.xml saml.metadata.entityBaseURL=https://hostname/kylin saml.context.scheme=https saml.context.serverName=hostname saml.context.serverPort=443 saml.context.contextPath=/kylin ### MAIL ### # If true, will send email notification; mail.enabled=false mail.host= mail.username= mail.password= mail.sender= ### WEB ### # Help info, format{name|displayName|link}, optional kylin.web.help.length=4 kylin.web.help.0=start|Getting Started| kylin.web.help.1=odbc|ODBC Driver| kylin.web.help.2=tableau|Tableau Guide| kylin.web.help.3=onboard|Cube Design Tutorial| # Guide user how to build streaming cube kylin.web.streaming.guide=http://kylin.apache.org/ # Hadoop url link, optional kylin.web.hadoop= #job diagnostic url link, optional kylin.web.diagnostic= #contact mail on web page, optional kylin.web.contact_mail= crossdomain.enable=true
1. 运行./bin/find-hive-dependency.sh
看Hive环境是否配置正确,提示找不到HCAT_HOME路径。
解决方法:export HCAT_HOME=$HIVE_HOME/hcatalog
然后重新运行脚本
2. 在kylin web界面load hive表失败,提示failed to take action。
解决方法:
vi ./bin/kylin.sh
需要对此脚本做两点修改:
1. export KYLIN_HOME=/home/grid/kylin # 改成绝对路径
2. export HBASE_CLASSPATH_PREFIX=${tomcat_root}/bin/bootstrap.jar:${tomcat_root}/bin/tomcat-juli.jar:${tomcat_root}/lib/*:$hive_dependency:$HBASE_CLASSPATH_PREFIX # 在路径中添加$hive_dependency
3. Kylin如何添加登录用户
官方doc给出解决思路:Kylin是采用Spring security framework做用户认证的,需要配置${KYLIN_HOME}/tomcat/webapps/kylin/WEB-INF/classes/kylinSecurity.xml 的sandbox,testing
部分
<beans profile="sandbox,testing"> <scr:authentication-manager alias="authenticationManager"> <scr:authentication-provider> <scr:user-service> ... <scr:user name="ADMIN" password="$2a$10$o3ktIWsGYxXNuUWQiYlZXOW5hWcqyNAFQsSSCSEWoC/BRVMAUjL32" authorities="ROLE_MODELER, ROLE_ANALYST, ROLE_ADMIN" /> <scr:user name="xxx" password="xxx" authorities="ROLE_MODELER, ROLE_ANALYST, ROLE_ADMIN" /> ...
password需要spring加密:
<dependency> <groupId>org.springframework.security</groupId> <artifactId>spring-security-core</artifactId> <version>4.0.0.RELEASE</version> </dependency>
String password = "123456"; org.springframework.security.crypto.password.PasswordEncoder encoder = new org.springframework.security.crypto.bcrypt.BCryptPasswordEncoder(); String encodedPassword = encoder.encode(password); System.out.print(encodedPassword);
4. 建立cube时报错FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
莫名其妙的错误,在kylin.log看不到root cause,需要去hive配置的log查看(log4j中设置,默认目录是/tmp/$user/),找到原因是error message: “Error: org.apache.hadoop.util.NativeCodeLoader.buildSupportsSnappy()Z”
原来是压缩格式的问题,kylin默认并没有采用hadoop的lzo压缩格式,而是采用了snappy。
有3个解决方案:
1.用kylin apache-kylin-1.5.2.1-HBase1.x-bin.tar.gz 代替apache-kylin-1.5.2.1-bin.tar.gz重新部署,因为我用的是hbase0.98,所以被pass。。
2. 改换成lzo压缩,要麻烦一点,具体查考http://kylin.apache.org/docs15/install/advance_settings.html
3. hive和hbase不采用压缩(cube build时间也许会变长,具体自行评估),在配置文件conf/kylin.properties和conf/*.xml (grep snappy),然后全部删掉snappy和compress的配置。
5. 建立cube的step3Extract Fact Table Distinct Columns 报错java.net.ConnectException: Call From master1/192.168.64.11 to localhost:18032 failed on connection exception: java.net.ConnectException: Connection refused
解决方案:
这个issue花费了太多时间,网上查到都说是yarn端口配置问题,但是我修改了yarn-site.xml之后还是不行。后来又以为是hive metastore server的问题。但是修改了之后还是同样问题。
没办法,我最后只好换成HBase 1.1.6,同时kylin版本也要找到对应hbase1.x版。问题解决。。。
6. cube创建成功后 ,查询sql出现error in coprocessor
解决方法:
这个问题真的困扰了好几天,我kylin.coprocessor.local.jar=/../kylin/lib/kylin-coprocessor-1.5.4.jar 已经配好了的,网上的解决方法是find-hbase-dependency.sh脚本里 hbase_dependency=绝对路径/habse-1.1.6/lib,但貌似还是没什么用。
最后是完全删除了hdfs上hbase的数据,重启hbase才成功。估计还是cube创建过程中出现了什么问题,以后再考证吧。
7. 关于Kylin sql
有了处理count distinct的问题的经验,我们发现,针对Kylin sql列出如下的区别:
-
不能limit beg, end 只能limit length
-
不支持 union, union all
-
不支持 where exists 子句
8. 清理Kylin的中间存储数据
Kylin在创建cube过程中会在HDFS上生成很多的中间数据。另外,当我们对cube执行build/drop/merge时,一些HBase的表可能会保留在HBase中,而这些表不再被查询,所以需要我们能够每隔一段时间做一些离线存储的清理工作。具体步骤如下:
1. 检查哪些资源需要被清理,这个操作不会删除任何内容:
${KYLIN_HOME}/bin/kylin.sh org.apache.kylin.storage.hbase.util.StorageCleanupJob –delete false
Time taken: 1.339 seconds OK kylin_intermediate_kylin_sales_cube_desc_2b8ea1a6_99f4_4045_b0f5_22372b9ffc60 kylin_intermediate_weibo_cube_0d26f9e5_0935_409a_9a6d_6c1d03773fbd kylin_intermediate_weibo_cube_1d21fe49_990c_4a34_9267_e693421689f2 Time taken: 0.33 seconds, Fetched: 3 row(s) ------ Intermediate Hive Tables To Be Dropped ------ ---------------------------------------------------- 2016-10-12 15:24:25,881 INFO [main CubeManager:132]: Initializing CubeManager with config kylin_metadata@hbase 2016-10-12 15:24:25,897 INFO [main CubeManager:828]: Loading Cube from folder kylin_metadata(key='/cube')@kylin_metadata@hbase 2016-10-12 15:24:25,952 INFO [main CubeDescManager:91]: Initializing CubeDescManager with config kylin_metadata@hbase 2016-10-12 15:24:25,952 INFO [main CubeDescManager:197]: Reloading Cube Metadata from folder kylin_metadata(key='/cube_desc')@kylin_metadata@hbase 2016-10-12 15:24:26,035 DEBUG [main CubeDescManager:222]: Loaded 2 Cube(s) 2016-10-12 15:24:26,038 DEBUG [main CubeManager:870]: Reloaded new cube: userlog_cube with reference beingCUBE[name=userlog_cube] having 1 segments:KYLIN_WEK77BKP6M 2016-10-12 15:24:26,040 DEBUG [main CubeManager:870]: Reloaded new cube: weibo_cube with reference beingCUBE[name=weibo_cube] having 1 segments:KYLIN_5N8ZRC7Z1F 2016-10-12 15:24:26,040 INFO [main CubeManager:841]: Loaded 2 cubes, fail on 0 cubes 2016-10-12 15:24:26,218 INFO [main StorageCleanupJob:218]: Skip /kylin/kylin_metadata/kylin-779df736-75b0-4263-b045-6a49401b4516 from deletion list, as the path belongs to segment userlog_cube[19700101000000_20160930000000] of cube userlog_cube 2016-10-12 15:24:26,218 INFO [main StorageCleanupJob:218]: Skip /kylin/kylin_metadata/kylin-e9805d06-559a-4c15-ab1e-d6e947460093 from deletion list, as the path belongs to segment weibo_cube[19700101000000_20140430000000] of cube weibo_cube --------------- HDFS Path To Be Deleted --------------- /kylin/kylin_metadata/kylin-07e8f9b1-8dfc-4c57-8e5b-e9800392af0d /kylin/kylin_metadata/kylin-0855f8ed-89a5-4676-a9bb-f8c301ead327 /kylin/kylin_metadata/kylin-0cdef491-d0b7-438d-ba54-091678cb463d /kylin/kylin_metadata/kylin-121752c8-ab9d-434b-812f-73f766796436 /kylin/kylin_metadata/kylin-12b442a0-0c6d-43e7-830f-2f6e5826f23a /kylin/kylin_metadata/kylin-5ba7affe-d584-4f6e-85b2-2588e31a985c /kylin/kylin_metadata/kylin-5e1818bd-4644-4e8e-b332-b5bb59ff9677 /kylin/kylin_metadata/kylin-680f7549-48be-496a-82c5-084434bfee74 /kylin/kylin_metadata/kylin-707d1a65-392e-456f-97ea-d7d553b52950 /kylin/kylin_metadata/kylin-7520fc6e-8b76-43cc-9fb8-bfba969040da /kylin/kylin_metadata/kylin-75e5b484-4594-4d31-83ce-729a6b3de1c2 /kylin/kylin_metadata/kylin-79535d79-cd36-4711-858c-d8fa28266f7f /kylin/kylin_metadata/kylin-81eb9119-c806-4003-a6d6-fc43281a8c01 /kylin/kylin_metadata/kylin-839e80d8-d116-4061-80d6-379c85db7114 /kylin/kylin_metadata/kylin-843b185d-ed09-48c7-958c-1ee1e0e2cde5 /kylin/kylin_metadata/kylin-97c0cdc6-c53e-4115-995e-b90f4381d307 /kylin/kylin_metadata/kylin-998aa0aa-279c-44f0-8367-807b9110ae74 /kylin/kylin_metadata/kylin-ad2ad0c7-bee5-46f2-9fc3-e60b10941ffa /kylin/kylin_metadata/kylin-b5939b9b-2a6e-4acb-aaf7-888a83113ad7 /kylin/kylin_metadata/kylin-b65b555d-90e5-4455-95ce-10b215b00482 /kylin/kylin_metadata/kylin-d5ac36b3-b021-4ac6-87ae-f3a38f90eb06 /kylin/kylin_metadata/kylin-e7a9b0d1-a788-4ddf-88f5-37671eaa7dc3 /kylin/kylin_metadata/kylin-f7094827-00f8-474b-9542-ea001797a148 ------------------------------------------------------- 2016-10-12 15:24:26,475 INFO [main StorageCleanupJob:91]: Exclude table KYLIN_WEK77BKP6M from drop list, as it is newly created 2016-10-12 15:24:26,475 INFO [main StorageCleanupJob:102]: Exclude table KYLIN_5N8ZRC7Z1F from drop list, as the table belongs to cube weibo_cube with status READY --------------- Tables To Be Dropped --------------- ----------------------------------------------------
2. 如上图所示,列出了在hive/HDFS/Hbase中可以被删除的表或文件(同时自动过滤掉最近生成或者查询过的表)。 根据上面的输出结果,查看表是否真的不再需要。确定之后,用1的命令把“–delete false”改成true就开始执行清理操作。
${KYLIN_HOME}/bin/kylin.sh org.apache.kylin.storage.hbase.util.StorageCleanupJob –delete true
看完上述内容,你们掌握使用kylin的示例分析的方法了吗?如果还想学到更多技能或想了解更多相关内容,欢迎关注云搜网行业资讯频道,感谢各位的阅读!