When a client connects to a caGrid service, the client and service must both decide to trust each other. If that does not happen, the connection is terminated before the client gets the opportunity to request data or services.
The existence of a trust problem usually involves an error message that includes the words "Unknown CA" and happens at any time in the interaction of the client and service. If the problem occurs at a very early stage when a client is trying to connect to a service, the error message may include "java.lang.NullPointerException". Both of these circumstances point to a trust-related problem. However, it is not possible to determine the exact cause of the problem from the error messages that are produced.
This page suggest options to correct the problem.
I just received a grid proxy credential from Dorian, but secure grid services are rejecting it. Why?
There are a number of potential problems to investigate. However, one basic problem may be that time is out of sync between machines on the grid. This can be either between two machines running grid services (when two grid services communicate) or between your machine (as the client) and a server running the grid service you are contacting. The proxy certificate you receive after logging in expires after a maximum of 12 hours. If time is of out of sync, the service may be telling you your proxy certificate is expired.
Additionally, certain services (such as the index service) require that clients requesting advertisement have a valid time range. If the requested time range is off significantly from the index service's system clock, this can be a reason for the index service to reject the advertisement request. For more information on troubleshooting the index service, refer to the question on this page: I'm having problems with advertisement/registration, what should I do?
Make sure that all relevant machines have their clocks set accurately. There are a number of ways to do this (dependent on your typical configuration), but one common way to ensure accurate time is to use the Network Time Procol (ntp) server. NTP configuration is OS-dependent, so check your operating system's administration guide for details.
There is a bug in Globus when using tomcat, when services "activate on startup," wherein the ServiceHost API returns the wrong information. caGrid services "activate on startup" and use the ServiceHost API to determine their service URL. This information is passed to the Index Service for registration.
The caGrid installer already applies the work around (#4 below), but if you didn't run the installer, you can use one of the work arounds below:
- Use the Globus container.
- Use tomcat on port 8443 with https.
- Call a method on your service when you start the container.
- Set parameters "defaultPort" and "defaultProtocol" in web.xml in tomcat, to how you
are running tomcat (e.g. 8080 and http).
|NOTE: If you are running behind a port redirecting proxy or firewall, you also may also have to add a "proxyPort" and/or "proxyName" attribute(s) to your server.xml. This will ensure that registration code gets told the specified port/host when it asks Tomcat for that information. See the tomcat documentation for more information. For example, this page has some nice information on running Tomcat on port 80 as a non-root user.|
There are numerous potential causes, but basically the error is telling you the client can't connect to the server.
Here are some common causes:
- The service URL is not correct. Double check it (protocol, port, etc).
- The service is not running. Verify you can hit the service URL with your web browser.
- The service may be behind a firewall, or your client machine may have firewall restrictions.
- If you are behind a firewall and require a proxy to access the internet, you will need to configure proxy settings on the client.
- You can find information on how to do this on Axis's web site.
- Those properties can be passed as -D options on the commandline, or in code using System.setProperty()
- You can find information on how to do this on Axis's web site.
There is a jar file that is processing every encrypted message that comes into the service and it's logger is set to be on by default unless you turn it off. To turn it off you can simply add the snippet below to the JBOSS_HOME/server/default/conf/log4j.xmlfile in the area where you see other categories being listed. This will disable this debugging output and stop the chattering log files.
<priority value="OFF" />
Check to make sure that your database is on the port you think it is and that you have configured the database with networking enabled. This paragraph from the MySQL documentation has more details:
|Make sure that the server has not been configured to ignore network connections|
or (if you are attempting to connect remotely) that it has not been configured
to listen only locally on its network interfaces. If the server was started with
--skip-networking, it will not accept TCP/IP connections at all. If the server
was started with --bind-address=127.0.0.1, it will listen for TCP/IP connections
only locally on the loopback interface and will not accept remote connections.
To fix this problem, follow these instructions:
|Note that on some Linux distributions MySQL is configured so that it will not list -|
ten on the network for connections. This is done for security reasons, but it prevents
Java from being able to connect to MySQL via the JDBC driver. To fix this, search for
your my.cnf file (it is probably in /etc or /etc/sysconfig). There are two ways in which
this may be disabled. If you find a directive called skip-networking, comment it by
putting a hash mark (#) in front of it. If you find a directive called bind-address and it
is configured to bind only to localhost (127.0.0.1), comment it out by putting a hash mark (#) in front of it. Save the file and then restart MySQL.
If you're database in on another machine (not "localhost"), then check that there is no firewall blocking your database connection:
|Check to make sure that there is no firewall blocking access to MySQL. Your firewall|
may be configured on the basis of the application being executed, or the port number
used by MySQL for communication (3306 by default). Under Linux or Unix, check your IP
tables (or similar) configuration to ensure that the port has not been blocked. Under
Windows, applications such as ZoneAlarm or the Windows XP personal firewall may need
to be configured not to block the MySQL port.
More information can be found in the MySQL troubleshooting documentation.
I upgraded a service with Introduce 1.2, but now my service doesn't publish service metadata. What is wrong?
You need to patch Introduce 1.2 before performing the upgrade. There is a bug in the Introduce 1.2 release that causes this problem. It has since been fixed. Read more about how to update Introduce here: Introduce 1.2 Software Updates.
What does this error mean? "Missing default constructor? Error was: java.lang.InstantiationException."
Globus is trying to deserialize into a Class that can't be constructed. Check your serialization settings and see here if you are using abstract schema types.
What does "forward references are not supported" or "org.exolab.castor.mapping.MappingException: No class descriptor found for extended class ..." mean?
If you see this error you most likely are using Castor to serialize, and you have a model with inheritance. There is a limitation in Castor where inherited classes must be in a certain order. See information on this here, in the closed caGrid bug. If you are using the caCORE SDK, this is not handled properly, and you will need to correct your mapping file. See the caCORE bug here. Follow the instructions here on how to configure custom Castor mappings.
When I connect to a secure service, I see this exception containing this message: "Caused by: Failure unspecified at GSS-API level Caused by: Unknown CA. What is that from?
This means that there is a trust issue between you and the service. Either you don't trust the CA which signed the credentials the service is using, or the service doesn't trust your credentials, if you are using them. You can run "ant syncWithTrustFabric" from the caGrid directory to synchronize the trust fabric, but make sure you are configured to use the same grid environment as the service you are communicating with.
As of caGrid 1.1, you can also run "ant generateTrustReport" from the gridca project (caGrid/projects/gridca). This will prompt you for a file location in which it will write a text file, and print to the screen, all of the details of the CA which you trust (and detail any potential problems).
You may also verify the issuer of your user certificates and host certificates using functionality provided by Globus:
- Unix / Mac
When I connect to a service I've deployed with my client, the client can't connect and the server outputs the following message: java.io.IOException: Token length 1347375956 > 33554432. What is the problem?
The detailed stacktrace on the server side is the following:
The problem is that the service is running in a secure globus container (using https protocol), but the client is trying to connect to the service using an insecure (http) protocol. Change your client to connect to the service using a secure protocol (modify the service URL to https instead of http).
When I connect to a service I've deployed with my client to either secure Tomcat or JBoss, I get a stacktrace error which starts with something like:
Firstly, make sure that you have configured a Secure Globus deployment into Tomcat (manual configuration or using the caGrid Installer) or JBoss. Meaning, that you ran the "deploySecureTomcat" or "deploySecureJBoss" target when deploying Globus. You could also check the $CATALINA_HOME/webapps/wsrf/WEB-INF/etc/globus_wsrf_core/server-config.wsdd to make sure it has the following entry.
If you have eliminated the above as not to be the problem, then the other problem could be that when you deployed and were editing the server.xml file of the Tomcat server, standalone or in tomcat, that you forgot to put the "Valve" into the engine. Please make sure that you put the correct "Valve" into this file after the "Engine" element and this error should go away. See http://www.globus.org/toolkit/docs/4.0/common/javawscore/admin-index.html#javawscore-admin-tomcat-deploying for details.
The DiscoveryClient gives me an exception with org.xml.sax.SAXException:SimpleDeserializer, what is that from?
As with most Globus clients, a properly configured client-config.wsdd file must be accessible by the underlying Axis engine. The simplest way to do this is to either run with your $GLOBUS_LOCATION as the "working directory", add $GLOBUS_LOCATION to your classpath, or copy $GLOBUS_LOCATION/client-config.wsdd to your working directory or classpath.
If you don't do this, you will most likely see an exception similar to that shown below:
My service client fails with an exception saying "The arguments do not match the signature.; nested exception is: java.lang.IllegalArgumentException: object is not an instance of declaring class" in the stack trace. Why does this happen?
Globus WSRF keeps all libraries for deployed services in a single lib directory in the container. This has the side affect that deploying multiple services to the same container can occasionally class version conflicts, especially when a container which has older services deployed is used to deploy newer services. For example, a container to which a caGrid 1.0 data service was deployed cannot be used for deploying a caGrid 1.1 data service because the jar names have changed, and the classloader may pick up the 1.0 jars when attepmting to run the 1.1 service. Often, this manifests itself at runtime with a stack trace like the following:
This issue can usually be resolved by deploying the service to a completely clean container to which no services have ever been deployed and running again.
Domain objects returned from my caCORE SDK-backed data service throw an exception when calling getter methods on associations.
When querying a caGrid Data Service backed by the caCORE SDK, the domain objects returned throw an exception when a getter to an associated object is called, similar to the following:
The CQL specification requires that the target object of a query (and only the target) be returned, and that associations are intentionally left unpopulated. When the XML results of a CQL query are deserialized, only the attributes of the target data object are populated, and associations are left as null values. Beans generated by the caCORE SDK versions 3.1 and 3.2/3.2.1 are configured to detect this null condition, and attempt to use the application service API to populate the associated objects on calls to the getter methods. Since the grid client is disconnected from the actual application service backend, this fails with the above exception.
This issue is caused by mismatch of HTTP version. All Globus services and clients require HTTP 1.1 version. This manifests if Apache server is used as proxy and is a version older than 2.2 (which only supports HTTP 1.0). The issue can be fixed by upgrading the servers to use HTTP 1.1 version. This problem can occur if an older HTTP proxy re-directs to the container for your service, and doesn't support HTTP 1.1.
I can't import the caGrid 1.2 CDS DelegatedCredentialTypes.xsd into my analytical service. It prints out a stacktrace complaining about imported types.
If you import types from File system in Introduce 1.2, and import the projects/cds/schema/CredentialDelegationService/DelegatedCredentialTypes.xsd file, you will see an error from Introduce and a stacktrace on the console:
The work-around is to modify the schema in the cds project build file: projects/cds/build/schema/CredentialDelegationService with the following contents:
Then go to Introduce and import the schema (from the build directory specifically). You will see 4 new namespaces on the left-hand side:
Remove the indicated namespaces above by clicking on the namespace and clicking "Remove" on the right-hand pane.
Keep the CredentialDelegationService namespace, however.
When contacting a Grid service, I receive the following error: Authentication failed. Caused by: Failure unspecified at GSS-API level Caused by: Bad certificate... What is the problem?
The solution is found here: knowledgebase:GSSAPI - Bad Certificate Error Solution
Note: original description can be found on the CCTS wiki: CCTS Troubleshooting
You may notice something in the JBoss logs similar to the following:
This is most likely an anomalous and benign "problem". That is, it should not affect the operation of any applications deployed to JBoss. More information can be found at the following links:
Globus Java SSL client side fails with EOFException with some servers (seen with IIS - .NET interop)
When Globus libraries are used for SSL communication, the default mode is signature. This implies that the channel is integrity protected, but not encypted. With some servers that are set up to not accept null cyptos, this will fail with a EOFException. The Globus SSL library uses the following TLS_RSA_WITH_NULL_MD5 for handshake if only signaure is used. Some servers might be set up to not allow that algorithm, in which case the handshake will fail. To communicate with such servers enable encryption on the client. Refer to Globus Security Documentation for details.
The Globus C SSL client negotiates with the server if the null encryption fails and hence might work against the same server. Please refer to this CaGrid User mailing list thread for details.
After deploying caGrid transfer service to a container you will see an INFO message logged that may cause you concerns. This error indicates that the container has identified a servlet.jar file that is not in the expected location. It is safe to disregard this message.
The maximum number of results that are shown on the caB2B Client Interface & the DCQL Client seems to be 1000, even in cases where it is more than that. What could be the reason?
The setting is on the server side and can not be changed from the client side. However, from the client side, the collection is lazily populated so if you iterate through the result collection then you should get all the results back. In case of the data service the SDK client is embedded in the grid service. So the grid service should return you all the results regardless of the limit set on the database query.
Try changing the resultCountPerQuery property found in the SDK application-config.xml configuration file. This property is set to 1000 by default:
When using WEBSSO why do I get a Null Pointer Exception when retrieving Authentication Providers from Dorian?
This occurs when the Authentication Provider URL is empty. To fix this problem download the latest caGrid 1.3 release which includes the patch.
Why am I getting a "java.io.EOFException at org.globus.gsi.gssapi.net.impl.GSIGssInputStream.readHandshakeToken(GSIGssInputStream.java:56)" error?
This is due to a time discrepancy between your system and Globus. If the time range between the two systems is off significantly the request will be rejected. One common way to ensure accurate time is to use the Network Time Protocol (NTP). You can find information about available NTP time servers at http://www.pool.ntp.org
Why do I get a "java.lang.ClassNotFoundException: org.cagrid.mapping.portal.discovery.CaCoreSDKDataTypeSelectionComponent" error when I try to modify a service in Introduce?
This is because Introduce cannot locate the caCORE DataType Mapping Extension (CDME). CDME is not included with the regular installation of caGrid.
If you do not need the CDME, you can remove the references to CDME from Introduce by doing the following steps:
- Shutdown Introduce
- Goto to the Introduce Extensions directory (e.g. c:\caGrid\projects\introduce\extensions)
- Delete the mapping directory
If you do need to use CDME, you'll need to re-install it. The installation instructions can be found in the CDME Installation Guide (http://cagrid.org/display/cdme/Installation+Guide)
This is probably because Tomcat was not restarted after the service was deployed.
What is the correct URL for the CQLQuery schema mentioned in http://cagrid.org/display/dataservices/CQL+Schemas?
The XSD contents are seen on that web page. The "canonical" location for them would be to download them from the GME, but if you just want an actual file to use, you can reference them from our software release (caGrid/projects/cql/schema/cql1.0/) or SVN:https://ncisvn.nci.nih.gov/svn/cagrid/tags/caGrid-1_3_release_final/cagrid-1-0/caGrid/projects/cql/schema/cql1.0/
This is because WSRF creates its own JNDI registry. Grid services only access the JNDI registry created by WSRF, not the JBoss JNDI. This article explains how to configure a JNDI datasource in the WSRF JNDI registry: http://cagrid.org/display/knowledgebase/Configuring+a+Globus+JNDI+Datasource+for+a+caGrid+Service
When deploying a secure tomcat container in a environment that doesn't have internet in general, deployment fails because it can't access the DTD in a file.
This is a known issue that has been fixed in caGrid 1.4 (and the 1.3 SNAPSHOT release). Please update the tomcat/build.xml by replacing the <xmlcatalog> with this... (note the change to the location attributes to use "..")
There was a bug discovered in regards to using Date attributes in CQL. To fix this problem download the latest caGrid 1.3 release which includes the patch.
You need to update the cagrid/projects/introduce/antfiles/introduce-utils.xml file. This will allow introduce to increase the memory used.
change first two jvmarg values to suit your service.
This is due to a problem with GTS's TrustedAuthorityManager code. To fix this problem download the latest caGrid 1.3 release which includes the patch.
This might be due to a bug in CQL date attribute. To fix this problem download the latest caGrid 1.3 release which includes the patch.
I get a "Authentication failed "Caused by: Failure unspecified at GSS-API level [Caused by: Unknown CA]" when using a non-secure DCQL non-secure client.
To fix this problem sync with the trust fabric. Refer to https://cagrid.org/display/gts13/SyncGTS
When attempting to authenticate a user, I'm getting an error message in the format of : The Assertion is not valid at <time1>, the assertion is valid from <time2> to <time3>
If you look at the error message closely, you'll notice that time1 is not in the range of time2 to time3. This is due to a time discrepancy between your system and Globus. One common way to ensure accurate time on your system is to use the Network Time Protocol (NTP). You can find information about available NTP time servers at http://www.pool.ntp.org