Access Keys:
Skip to content (Access Key - 0)

Tide

Tide


The Tide system is a bitTorrent-like, Grid-based parallel data transfer solution.

Overview


The Tide system is designed to be a bittorrent like grid based parallel data transfer solution. It borrows some aspects from the overall bittorrent model however is different is some aspects. The key difference, other than utilizing grid based protocols for negotiating all transfers, is that in the grid, currently, we don't see the massive ad-hoc user community that you might find in bittorrent. That is, because the data we are talking about transferring, at this point from our user community, the data may only be read by certain people from a certain group or with certain credentials. By limiting the potential data consumer side there will not be much use, at least easily on, for supporting swarms style data transfer. That is, data that is being peer to peer transfered from to consumers who are consuming the same data at the same time. This scenario will be unlikely to occur judging from our use cases. So Tide attempts to be very simple in the way that it stores, publishes, and advertises data replicas. Also, there are no fixed chunk size requirements and no requirements that all the data actually exist anywhere. The Tide system enables a Tide Descriptor to be created that will describe the tide and the currents (chunks) that make up the tide. This Descriptor is then published to a Tide Replica Manager who will manage a list of potential Tide Servers that these data pieces can be consumed from. The pieces can then be consumed in any order and in any proportion from any or all of the potential Tide Servers by any retrieval algorithm on the client side. This type of flexibility in storage and consumption will allow our user community to design solution for them that best suite the predicted storage and retrieval patterns from there respective communities which still gaining the essential parallelism of multi sender one or multi retriever over the internet using standard HTTP protocols.

Architecture


Tide is comprised of 3 main components. The TideService, TideReplicaManagementService, and Tide the client software. The TideService is the service which maintains a Tide (the descriptor of the data and the actual data pieces, also known as currents). A Tide can be generated with any data. In order to publish a Tide one must first create a TideDescriptor. The TideDescriptor will containe a TideInformation and a list of Currents. The TideInformation is the metadata about the Tide such as name, length, md5sum, and a description. The list of Currents is metadata about each data piece such as length, md5sum, and offset that this piece would exist in the original data. Once this TideDescriptor is created it can then me used to generate a new Tide by using the TideService to create a Tide. This creation operation will return a TransferServiceContextReference which can then be used to upload the actual data to the TideService.

The TideReplicaManager is a service that acts as a registry for Tides. It enables TideDescriptors to be hosted along with hosts which claim to have the particular Tide data available. Utilizing this registry to track the replicas which may exist enable the Tide client software to utilize the information for retrieval.

The image below shows an example of querying the TideReplicaManager for a particular Tide, retrieving the TideInformation and using that to get a list of the hosts who claim to have the particular tide. Once we have this information the Tide client can be used to retrieve the tide using one of several retrieval algorithms.

Schema


<?xml version="1.0" encoding="UTF-8"?>
<schema targetNamespace="http://tide.cagrid.org/TideDescriptor"
	xmlns:tns="http://tide.cagrid.org/TideDescriptor"
	xmlns:wsa="http://schemas.xmlsoap.org/ws/2004/03/addressing"
	xmlns:transferTypes="http://transfer.cagrid.org/TransferService/Context/types"
	xmlns:xs="http://www.w3.org/2001/XMLSchema"
	xmlns="http://www.w3.org/2001/XMLSchema" elementFormDefault="qualified"
	attributeFormDefault="unqualified">

	<import namespace="http://schemas.xmlsoap.org/ws/2004/03/addressing"
		schemaLocation="../ws/addressing/WS-Addressing.xsd">
	</import>


	<xs:import
		namespace="http://transfer.cagrid.org/TransferService/Context/types"
		schemaLocation="./TransferServiceContextTypes.xsd">
	</xs:import>

	<element name="TideInformation" type="tns:TideInformation"></element>

	<complexType name="TideInformation">
		<xs:attribute type="string" name="id" use="required"></xs:attribute>
		<xs:attribute type="string" name="name" use="required"></xs:attribute>
		<xs:attribute type="string" name="description"></xs:attribute>
		<xs:attribute type="string" name="type"></xs:attribute>
		<xs:attribute type="long" name="size" use="required"></xs:attribute>
		<xs:attribute type="string" name="md5sum" use="required"></xs:attribute>
		<xs:attribute type="int" name="chunks" use="required"></xs:attribute>
	</complexType>


	<element name="TideDescriptor" type="tns:TideDescriptor"></element>

	<complexType name="TideDescriptor">
		<xs:sequence>
		    <element ref="tns:TideInformation" minOccurs="1" maxOccurs="1"></element>
			<element ref="tns:Currents" minOccurs="1"
				maxOccurs="1">
			</element>
		</xs:sequence>
	</complexType>

	<element name="Currents" type="tns:Currents"></element>
	<complexType name="Currents">
		<xs:sequence>
			<element ref="tns:Current" minOccurs="1"
				maxOccurs="unbounded">
			</element>
		</xs:sequence>
	</complexType>

	<element name="Current" type="tns:Current"></element>
	<complexType name="Current">
		<xs:attribute type="int" name="chunkNum"></xs:attribute>
		<xs:attribute type="long" name="offset"></xs:attribute>
		<xs:attribute type="string" name="md5sum"></xs:attribute>
		<xs:attribute type="long" name="size"></xs:attribute>
	</complexType>

	<element name="WaveRequest" type="tns:WaveRequest"></element>
	<complexType name="WaveRequest">
		<sequence>
			<element ref="tns:Current" minOccurs="1"
				maxOccurs="unbounded">
			</element>
		</sequence>
		<xs:attribute type="string" name="tideId"></xs:attribute>
	</complexType>

	<element name="WaveDescriptor" type="tns:WaveDescriptor"></element>

	<complexType name="WaveDescriptor">
		<xs:sequence>
			<xs:element
				ref="transferTypes:TransferServiceContextReference">
			</xs:element>
			<element ref="tns:Current" minOccurs="1"
				maxOccurs="unbounded">
			</element>
		</xs:sequence>
		<xs:attribute type="string" name="tideId"></xs:attribute>
	</complexType>

	<xs:element name="TideReplicasDescriptor"
		type="tns:TideReplicasDescriptor">
	</xs:element>

	<complexType name="TideReplicasDescriptor">
		<xs:sequence>
			<xs:element ref="tns:TideDescriptor" minOccurs="1"
				maxOccurs="1">
			</xs:element>
			<xs:element ref="tns:TideReplicaDescriptor" minOccurs="1"
				maxOccurs="unbounded">
			</xs:element>

		</xs:sequence>
	</complexType>

	<xs:element name="TideReplicaDescriptor"
		type="tns:TideReplicaDescriptor">
	</xs:element>

	<complexType name="TideReplicaDescriptor">
		<xs:sequence>
			<xs:element ref="wsa:EndpointReference" minOccurs="1"
				maxOccurs="1">
			</xs:element>
		</xs:sequence>
		<xs:attribute type="int" name="unreachableCount"></xs:attribute>
	</complexType>

</schema>

Components


TideReplicaManager

The TideReplicaManager is a service that acts as a registry for Tides. It enables TideDescriptors to be hosted along with hosts which claim to have the particular Tide data available. Utilizing this registry to track the replicas which may exist and enable the Tide client software to utilize the information for retrieval.

TideServer

The TideService is the service which maintains a Tide (the descriptor of the data and the actual data pieces, also known as currents). A Tide can be generated with any data. In order to publish a Tide one must first create a TideDescriptor. The TideDescriptor will containe a TideInformation and a list of Currents. The TideInformation is the metadata about the Tide such as name, length, md5sum, and a description. The list of Currents is metadata about each data piece such as length, md5sum, and offset that this piece would exist in the original data. Once this TideDescriptor is created it can then me used to generate a new Tide by using the TideService to create a Tide. This creation operation will return a TransferServiceContextReference which can then be used to upload the actual data to the TideService.

Software


The software is currently available in HEAD of the caGrid repositry. Tide should be available in the next release of caGrid (1.3) or in a sooner technology preview release.

Last edited by
Sarah Honacki (824 days ago) , ...
Adaptavist Theme Builder Powered by Atlassian Confluence