I’m following article: Developing AWS Glue ETL jobs locally using a container | AWS Big Data Blog.
It was written two years ago against: amazon/aws-glue-libs:glue_libs_1.0.0_image_01 but discusses that there will be a version 2, 3 of it when glue is upgraded.
I have everything working by following the article but I need version 2 of glue.
When setting up in pycharm use python3 for the python interpreter path.
I’m trying to create the PyGlue.zip from the github distribution.
GitHub - awslabs/aws-glue-libs: AWS Glue Libraries are additions and enhancements to Spark for ETL operations., branch glue-2.0
Set the PYTHONPATH=/home/glue_user/aws-glue-libs/PyGlue.zip:/home/glue_user/spark/python/lib/py4j-0.10.7-src.zip:/home/glue_user/spark/python
to match the container
Has anyone got past this using container: glue_libs_2.0.0_image_01
Now I get:
Traceback (most recent call last):
File “/c/Users/diuppa/git/dynamo-psql-migration/pyspark/08-dynamoToRDS-Carriers/dynamoToRDS-Carriers.py”, line 35, in
sc = SparkContext()
File “/home/glue_user/spark/python/pyspark/context.py”, line 136, in init
conf, jsc, profiler_cls)
File “/home/glue_user/spark/python/pyspark/context.py”, line 198, in _do_init
self._jsc = jsc or self._initialize_context(self._conf._jconf)
File “/home/glue_user/spark/python/pyspark/context.py”, line 306, in _initialize_context
return self._jvm.JavaSparkContext(jconf)
File “/home/glue_user/spark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py”, line 1525, in call
File “/home/glue_user/spark/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py”, line 328, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling None.org.apache.spark.api.java.JavaSparkContext.
: java.lang.NoSuchMethodError: io.netty.util.concurrent.SingleThreadEventExecutor.(Lio/netty/util/concurrent/EventExecutorGroup;Ljava/util/concurrent/Executor;ZLjava/util/Queue;Lio/netty/util/concurrent/RejectedExecutionHandler;)V
at io.netty.channel.SingleThreadEventLoop.(SingleThreadEventLoop.java:65)
at io.netty.channel.nio.NioEventLoop.(NioEventLoop.java:138)
at io.netty.channel.nio.NioEventLoopGroup.newChild(NioEventLoopGroup.java:146)
at io.netty.channel.nio.NioEventLoopGroup.newChild(NioEventLoopGroup.java:37)
at io.netty.util.concurrent.MultithreadEventExecutorGroup.(MultithreadEventExecutorGroup.java:84)
at io.netty.util.concurrent.MultithreadEventExecutorGroup.(MultithreadEventExecutorGroup.java:58)
at io.netty.util.concurrent.MultithreadEventExecutorGroup.(MultithreadEventExecutorGroup.java:47)
at io.netty.channel.MultithreadEventLoopGroup.(MultithreadEventLoopGroup.java:59)
at io.netty.channel.nio.NioEventLoopGroup.(NioEventLoopGroup.java:86)
at io.netty.channel.nio.NioEventLoopGroup.(NioEventLoopGroup.java:81)
at io.netty.channel.nio.NioEventLoopGroup.(NioEventLoopGroup.java:68)
at org.apache.spark.network.util.NettyUtils.createEventLoop(NettyUtils.java:50)
at org.apache.spark.network.client.TransportClientFactory.(TransportClientFactory.java:102)
at org.apache.spark.network.TransportContext.createClientFactory(TransportContext.java:99)
at org.apache.spark.rpc.netty.NettyRpcEnv.(NettyRpcEnv.scala:71)
at org.apache.spark.rpc.netty.NettyRpcEnvFactory.create(NettyRpcEnv.scala:461)
at org.apache.spark.rpc.RpcEnv$.create(RpcEnv.scala:57)
at org.apache.spark.SparkEnv$.create(SparkEnv.scala:249)
at org.apache.spark.SparkEnv$.createDriverEnv(SparkEnv.scala:175)
at org.apache.spark.SparkContext.createSparkEnv(SparkContext.scala:257)
at org.apache.spark.SparkContext.(SparkContext.scala:424)
at org.apache.spark.api.java.JavaSparkContext.(JavaSparkContext.scala:58)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:238)
at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)
at py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.lang.Thread.run(Thread.java:748)